dbt sources: Managing Data Sources in dbt
In dbt, sources are objects representing raw data tables loaded into your data warehouse by your ELT processes. By defining these tables as sources within your dbt project, you can manage data lineage, validate assumptions about data integrity, and monitor the timeliness of your data with freshness checks (opens in a new tab).
Declaring
Sources in dbt are defined within YAML files under the sources: key, typically stored in the models/ directory.
These definitions allow you to specify the database and schema of your source tables, and assign a logical name used within dbt.
Example of a source declaration:
_10sources:_10  - name: sales_data_10    database: analytics_10    schema: raw_sales_10    tables:_10      - name: transactions_10      - name: customers
This setup enables you to manage source metadata centrally and leverage dbt’s powerful features to interact with these tables directly in your transformation models.
Using the {{ source() }} Function
Once a source is defined, you can reference it in your models using the {{ source() }} function. This not only helps in creating clean and manageable code but also ensures that dependencies are correctly understood and handled by dbt.
Example of selecting from a source in a model
_10select t.customer_id, sum(t.amount) as total_spent_10from {{ source('sales_data', 'transactions') }} as t_10join {{ source('sales_data', 'customers') }} as c on t.customer_id = c.id_10group by t.customer_id;
This SQL snippet joins transactions and customers tables from the sales_data source.
Testing and Documenting Sources
Adding tests to sources is a best practice that helps ensure your assumptions about the underlying data hold true throughout the lifecycle of your data products.
Example of adding tests and descriptions to sources
_19sources:_19  - name: sales_data_19    description: "Data loaded from the online transaction system."_19    tables:_19      - name: transactions_19        description: "Records of all sales transactions."_19        tests:_19          - unique:_19              column_name: transaction_id_19          - not_null:_19              column_name: transaction_date_19      - name: customers_19        description: "Customer information."_19        columns:_19          - name: id_19            description: "Unique customer identifier."_19            tests:_19              - unique_19              - not_null
This configuration enhances the maintainability of your project by making it easier for new team members to understand the data structures and for existing team members to ensure data quality.
Monitoring Data Freshness
dbt allows you to configure freshness tests for your sources, which can alert you if your data isn't updated within expected timeframes, critical for maintaining SLAs.
Example of configuring source freshness
_11sources:_11  - name: sales_data_11    freshness:_11      warn_after: { count: 24, period: hour }_11      error_after: { count: 48, period: hour }_11    tables:_11      - name: transactions_11        loaded_at_field: last_updated_11        freshness:_11          warn_after: { count: 6, period: hour }_11          error_after: { count: 12, period: hour }
By setting these parameters, dbt helps you keep track of how current your data is, enabling proactive management of data pipelines.

Manage Sources and dbt Models in one place
Build end-to-end pipelines using a single framework.
Get Started