dbt SQL: Building and Configuring Models

In dbt (data build tool), models are central to transforming data directly within your data warehouse, encapsulating the 'T' in ELT (Extract, Load, Transform). These models are typically SQL files containing select statements, although dbt also supports Python from version 1.3 onwards for more complex data operations.

Models in dbt are not just limited to simple transformations. They can incorporate sophisticated SQL logic, utilize Python for advanced analytics, or integrate directly with machine learning workflows.

Configuring SQL Models

Basic Model Configurations

Configurations can be specified directly in the model's SQL file, within a YAML properties file, or globally in the dbt_project.yml file. Here are some foundational configurations useful for defining model behavior:

Materialization (opens in a new tab): Specifies how dbt should persist the model in the warehouse (e.g., view, table, incremental).
Tags: Helps in organizing and selecting groups of models within your project.
Pre-hooks and Post-hooks (opens in a new tab): SQL statements that run before or after the model is materialized.

Example: Configuring a Model

models/model.sql


_10{{ config(
_10    materialized='incremental',
_10    tags=['daily', 'revenue'],
_10    pre_hook="delete from {{ this }} where date = current_date",
_10    post_hook="grant select on {{ this }} to role analyst"
_10) }}
_10
_10select * from ...

This configuration sets the model to refresh incrementally, tags it for organizational purposes, and includes hooks for data management and security.

Managing dependencies

For managing dependencies or integrating with other models, use the ref() function. It allows models to reference each other, ensuring that dbt processes them in the correct order based on their dependencies.

Example: Using ref() to Manage Dependencies

models/dim_customer.sql


_10select customer_id, SUM(payment_amount) AS total_spent
_10from {{ ref('stg_payments') }}
_10group by customer_id

In this example, ref('stg_payments') ensures that stg_payments is processed before the current model (dim_customer), respecting the dependency chain.

Incorporating Sources into Your Models

Declaring sources in dbt allows you to manage and document the origins of your data. This practice not only clarifies data lineage but also facilitates the application of tests and freshness checks to ensure data quality and timeliness.

Example: Declaring and Using Sources

sources


_10sources:
_10  - name: ecommerce
_10    tables:
_10      - name: orders
_10      - name: customers


_10select ...
_10from {{ source('ecommerce', 'orders') }}
_10    join {{ source('ecommerce', 'customers') }} on ...

By using the source() function, this model specifies its data sources.

Practical Tips

Leverage Jinja for Dynamic SQL: Jinja templating in dbt allows for dynamic SQL generation which is powerful for creating flexible transformations based on variables or logic (if-then, for loops).
Document Your Models: Always document your models and sources within your dbt project. This practice not only aids in understanding and maintenance but also supports automated documentation generation.
Test Your Models: Ensure the reliability of your models by implementing dbt tests. These tests can validate model assumptions, like data uniqueness and non-null constraints, directly within your transformation pipeline.

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started