dbt
Manage Data Sources in dbt

dbt sources: Managing Data Sources in dbt

In dbt, sources are objects representing raw data tables loaded into your data warehouse by your ELT processes. By defining these tables as sources within your dbt project, you can manage data lineage, validate assumptions about data integrity, and monitor the timeliness of your data with freshness checks (opens in a new tab).

Declaring

Sources in dbt are defined within YAML files under the sources: key, typically stored in the models/ directory.

These definitions allow you to specify the database and schema of your source tables, and assign a logical name used within dbt.

Example of a source declaration:


_10
sources:
_10
- name: sales_data
_10
database: analytics
_10
schema: raw_sales
_10
tables:
_10
- name: transactions
_10
- name: customers

This setup enables you to manage source metadata centrally and leverage dbt’s powerful features to interact with these tables directly in your transformation models.

Using the {{ source() }} Function

Once a source is defined, you can reference it in your models using the {{ source() }} function. This not only helps in creating clean and manageable code but also ensures that dependencies are correctly understood and handled by dbt.

Example of selecting from a source in a model


_10
select t.customer_id, sum(t.amount) as total_spent
_10
from {{ source('sales_data', 'transactions') }} as t
_10
join {{ source('sales_data', 'customers') }} as c on t.customer_id = c.id
_10
group by t.customer_id;

This SQL snippet joins transactions and customers tables from the sales_data source.

Testing and Documenting Sources

Adding tests to sources is a best practice that helps ensure your assumptions about the underlying data hold true throughout the lifecycle of your data products.

Example of adding tests and descriptions to sources


_19
sources:
_19
- name: sales_data
_19
description: "Data loaded from the online transaction system."
_19
tables:
_19
- name: transactions
_19
description: "Records of all sales transactions."
_19
tests:
_19
- unique:
_19
column_name: transaction_id
_19
- not_null:
_19
column_name: transaction_date
_19
- name: customers
_19
description: "Customer information."
_19
columns:
_19
- name: id
_19
description: "Unique customer identifier."
_19
tests:
_19
- unique
_19
- not_null

This configuration enhances the maintainability of your project by making it easier for new team members to understand the data structures and for existing team members to ensure data quality.

Monitoring Data Freshness

dbt allows you to configure freshness tests for your sources, which can alert you if your data isn't updated within expected timeframes, critical for maintaining SLAs.

Example of configuring source freshness


_11
sources:
_11
- name: sales_data
_11
freshness:
_11
warn_after: { count: 24, period: hour }
_11
error_after: { count: 48, period: hour }
_11
tables:
_11
- name: transactions
_11
loaded_at_field: last_updated
_11
freshness:
_11
warn_after: { count: 6, period: hour }
_11
error_after: { count: 12, period: hour }

By setting these parameters, dbt helps you keep track of how current your data is, enabling proactive management of data pipelines.

Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started