dbt Seed: Integrate and Manage Static Data

In dbt, seeds are a convenient way to integrate static, version-controlled data directly into your data warehouse. By leveraging the dbt seed command, CSV files located in your project’s seeds directory are loaded into your warehouse, becoming part of the transformation process.

Overview of dbt Seeds

Seeds in dbt are best suited for data that changes infrequently and requires version control, such as:

✅ Mappings of country codes to country names.
✅ Lists of test emails to exclude from analysis.
✅ Employee account IDs.

They are less suited for:

❌ Loading raw data exported to CSVs.
❌ Production data containing sensitive information like PII or passwords.

Example: Loading Employee Roles

To load a seed file into your dbt project:

Add the CSV file to your seeds directory, e.g., seeds/employee_roles.csv


_10employee_id,role
_10001,Manager
_10002,Analyst
_10003,Engineer
_10...

Run the dbt seed command.


_14$ dbt seed
_14
_14Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 1 seed file
_14
_1414:46:15 | Concurrency: 1 threads (target='dev')
_1414:46:15 |
_1414:46:15 | 1 of 1 START seed file analytics.employee_roles........................... [RUN]
_1414:46:15 | 1 of 1 OK loaded seed file analytics.employee_roles....................... [INSERT 3 in 0.01s]
_1414:46:16 |
_1414:46:16 | Finished running 1 seed in 0.14s.
_14
_14Completed successfully
_14
_14Done. PASS=1 ERROR=0 SKIP=0 TOTAL=1

Reference seeds in downstream models using the ref function similar to how you reference models:

models/employee_access.sql


_10select * from {{ ref('employee_roles') }}

Configuring Seeds

You can configure seeds in your dbt_project.yml file. Configuration options include specifying column types, indices, and materialization strategies.

Documenting and Testing Seeds

Just like models, seeds can be documented and tested using YAML files. You can declare descriptions, tests, and other properties to ensure your seeds maintain integrity and are well understood.

FAQs

Can I load raw data with seeds?

It’s not recommended to use seeds for large CSV exports from production databases due to performance issues.

Can seeds be stored outside the seeds directory?

Yes, update the seed-paths configuration in your dbt_project.yml to change the default location.

What if the seed’s columns change?

If column changes occur, run dbt seed --full-refresh to rebuild the seed table with the new structure.

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started