dbt Seed: Integrate and Manage Static Data
In dbt, seeds are a convenient way to integrate static, version-controlled data directly into your data warehouse. By leveraging the dbt seed command, CSV files located in your project’s seeds directory are loaded into your warehouse, becoming part of the transformation process.
Overview of dbt Seeds
Seeds in dbt are best suited for data that changes infrequently and requires version control, such as:
- ✅ Mappings of country codes to country names.
- ✅ Lists of test emails to exclude from analysis.
- ✅ Employee account IDs.
They are less suited for:
- ❌ Loading raw data exported to CSVs.
- ❌ Production data containing sensitive information like PII or passwords.
Example: Loading Employee Roles
To load a seed file into your dbt project:
- Add the CSV file to your seeds directory, e.g.,
seeds/employee_roles.csv
_10employee_id,role_10001,Manager_10002,Analyst_10003,Engineer_10...
- Run the
dbt seed
command.
_14$ dbt seed_14_14Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 1 seed file_14_1414:46:15 | Concurrency: 1 threads (target='dev')_1414:46:15 |_1414:46:15 | 1 of 1 START seed file analytics.employee_roles........................... [RUN]_1414:46:15 | 1 of 1 OK loaded seed file analytics.employee_roles....................... [INSERT 3 in 0.01s]_1414:46:16 |_1414:46:16 | Finished running 1 seed in 0.14s._14_14Completed successfully_14_14Done. PASS=1 ERROR=0 SKIP=0 TOTAL=1
- Reference seeds in downstream models using the ref function similar to how you reference models:
Configuring Seeds
You can configure seeds in your dbt_project.yml file. Configuration options include specifying column types, indices, and materialization strategies.
Documenting and Testing Seeds
Just like models, seeds can be documented and tested using YAML files. You can declare descriptions, tests, and other properties to ensure your seeds maintain integrity and are well understood.
FAQs
- Can I load raw data with seeds?
It’s not recommended to use seeds for large CSV exports from production databases due to performance issues.
- Can seeds be stored outside the seeds directory?
Yes, update the seed-paths
configuration in your dbt_project.yml
to change the default location.
- What if the seed’s columns change?
If column changes occur, run dbt seed --full-refresh
to rebuild the seed table with the new structure.
Manage Sources and dbt Models in one place
Build end-to-end pipelines using a single framework.
Get Started