dbt
Manage Static Data Sources with dbt seeds

dbt Seed: Integrate and Manage Static Data

In dbt, seeds are a convenient way to integrate static, version-controlled data directly into your data warehouse. By leveraging the dbt seed command, CSV files located in your project’s seeds directory are loaded into your warehouse, becoming part of the transformation process.

Overview of dbt Seeds

Seeds in dbt are best suited for data that changes infrequently and requires version control, such as:

  • ✅ Mappings of country codes to country names.
  • ✅ Lists of test emails to exclude from analysis.
  • ✅ Employee account IDs.

They are less suited for:

  • ❌ Loading raw data exported to CSVs.
  • ❌ Production data containing sensitive information like PII or passwords.

Example: Loading Employee Roles

To load a seed file into your dbt project:

  1. Add the CSV file to your seeds directory, e.g., seeds/employee_roles.csv
employee_id,role
001,Manager
002,Analyst
003,Engineer
...
  1. Run the dbt seed command.
$ dbt seed

Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 1 seed file

14:46:15 | Concurrency: 1 threads (target='dev')
14:46:15 |
14:46:15 | 1 of 1 START seed file analytics.employee_roles........................... [RUN]
14:46:15 | 1 of 1 OK loaded seed file analytics.employee_roles....................... [INSERT 3 in 0.01s]
14:46:16 |
14:46:16 | Finished running 1 seed in 0.14s.

Completed successfully

Done. PASS=1 ERROR=0 SKIP=0 TOTAL=1
This will create a new table in your warehouse named after the seed file without needing any additional SQL
  1. Reference seeds in downstream models using the ref function similar to how you reference models:
models/employee_access.sql
select * from {{ ref('employee_roles') }}

Configuring Seeds

You can configure seeds in your dbt_project.yml file. Configuration options include specifying column types, indices, and materialization strategies.

Documenting and Testing Seeds

Just like models, seeds can be documented and tested using YAML files. You can declare descriptions, tests, and other properties to ensure your seeds maintain integrity and are well understood.

FAQs

  • Can I load raw data with seeds?

It’s not recommended to use seeds for large CSV exports from production databases due to performance issues.

  • Can seeds be stored outside the seeds directory?

Yes, update the seed-paths configuration in your dbt_project.yml to change the default location.

  • What if the seed’s columns change?

If column changes occur, run dbt seed --full-refresh to rebuild the seed table with the new structure.

Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started