dbt
Build and Configure Models

dbt SQL: Building and Configuring Models

In dbt (data build tool), models are central to transforming data directly within your data warehouse, encapsulating the 'T' in ELT (Extract, Load, Transform). These models are typically SQL files containing select statements, although dbt also supports Python from version 1.3 onwards for more complex data operations.

Models in dbt are not just limited to simple transformations. They can incorporate sophisticated SQL logic, utilize Python for advanced analytics, or integrate directly with machine learning workflows.

Configuring SQL Models

Basic Model Configurations

Configurations can be specified directly in the model's SQL file, within a YAML properties file, or globally in the dbt_project.yml file. Here are some foundational configurations useful for defining model behavior:

Example: Configuring a Model

models/model.sql

_10
{{ config(
_10
materialized='incremental',
_10
tags=['daily', 'revenue'],
_10
pre_hook="delete from {{ this }} where date = current_date",
_10
post_hook="grant select on {{ this }} to role analyst"
_10
) }}
_10
_10
select * from ...

This configuration sets the model to refresh incrementally, tags it for organizational purposes, and includes hooks for data management and security.

Managing dependencies

For managing dependencies or integrating with other models, use the ref() function. It allows models to reference each other, ensuring that dbt processes them in the correct order based on their dependencies.

Example: Using ref() to Manage Dependencies

models/dim_customer.sql

_10
select customer_id, SUM(payment_amount) AS total_spent
_10
from {{ ref('stg_payments') }}
_10
group by customer_id

In this example, ref('stg_payments') ensures that stg_payments is processed before the current model (dim_customer), respecting the dependency chain.

Incorporating Sources into Your Models

Declaring sources in dbt allows you to manage and document the origins of your data. This practice not only clarifies data lineage but also facilitates the application of tests and freshness checks to ensure data quality and timeliness.

Example: Declaring and Using Sources

sources

_10
sources:
_10
- name: ecommerce
_10
tables:
_10
- name: orders
_10
- name: customers


_10
select ...
_10
from {{ source('ecommerce', 'orders') }}
_10
join {{ source('ecommerce', 'customers') }} on ...

By using the source() function, this model specifies its data sources.

Practical Tips

  • Leverage Jinja for Dynamic SQL: Jinja templating in dbt allows for dynamic SQL generation which is powerful for creating flexible transformations based on variables or logic (if-then, for loops).
  • Document Your Models: Always document your models and sources within your dbt project. This practice not only aids in understanding and maintenance but also supports automated documentation generation.
  • Test Your Models: Ensure the reliability of your models by implementing dbt tests. These tests can validate model assumptions, like data uniqueness and non-null constraints, directly within your transformation pipeline.
Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started