dbt
Manage Static Data Sources with dbt seeds

dbt Seed: Integrate and Manage Static Data

In dbt, seeds are a convenient way to integrate static, version-controlled data directly into your data warehouse. By leveraging the dbt seed command, CSV files located in your project’s seeds directory are loaded into your warehouse, becoming part of the transformation process.

Overview of dbt Seeds

Seeds in dbt are best suited for data that changes infrequently and requires version control, such as:

  • ✅ Mappings of country codes to country names.
  • ✅ Lists of test emails to exclude from analysis.
  • ✅ Employee account IDs.

They are less suited for:

  • ❌ Loading raw data exported to CSVs.
  • ❌ Production data containing sensitive information like PII or passwords.

Example: Loading Employee Roles

To load a seed file into your dbt project:

  1. Add the CSV file to your seeds directory, e.g., seeds/employee_roles.csv

_10
employee_id,role
_10
001,Manager
_10
002,Analyst
_10
003,Engineer
_10
...

  1. Run the dbt seed command.

_14
$ dbt seed
_14
_14
Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 1 seed file
_14
_14
14:46:15 | Concurrency: 1 threads (target='dev')
_14
14:46:15 |
_14
14:46:15 | 1 of 1 START seed file analytics.employee_roles........................... [RUN]
_14
14:46:15 | 1 of 1 OK loaded seed file analytics.employee_roles....................... [INSERT 3 in 0.01s]
_14
14:46:16 |
_14
14:46:16 | Finished running 1 seed in 0.14s.
_14
_14
Completed successfully
_14
_14
Done. PASS=1 ERROR=0 SKIP=0 TOTAL=1

  1. Reference seeds in downstream models using the ref function similar to how you reference models:
models/employee_access.sql

_10
select * from {{ ref('employee_roles') }}

Configuring Seeds

You can configure seeds in your dbt_project.yml file. Configuration options include specifying column types, indices, and materialization strategies.

Documenting and Testing Seeds

Just like models, seeds can be documented and tested using YAML files. You can declare descriptions, tests, and other properties to ensure your seeds maintain integrity and are well understood.

FAQs

  • Can I load raw data with seeds?

It’s not recommended to use seeds for large CSV exports from production databases due to performance issues.

  • Can seeds be stored outside the seeds directory?

Yes, update the seed-paths configuration in your dbt_project.yml to change the default location.

  • What if the seed’s columns change?

If column changes occur, run dbt seed --full-refresh to rebuild the seed table with the new structure.

Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started