dbt contracts: Enforce Schema Consistency

dbt contracts establish a reliable schema for your models, ensuring that any transformations you apply result in a dataset that adheres to a predefined schema. This concept of “contracts” in dbt provides a robust framework for creating dependable data models that other processes and users can rely on.

Why Use Contracts?

In dbt, defining a model is typically a SQL select statement that outlines the dataset’s structure. However, without contracts, changes in the model’s structure during development could lead to inconsistencies, especially when other systems or models depend on the current model output. By implementing a contract, you establish a set of guarantees about the model's output structure. During the build process, dbt checks that the model adheres to its contract, adding a layer of reliability and reducing the risk of downstream errors.

Defining a Contract

Suppose you have a model that organizes products information. Below is an example of how you might define a contract for such a model:

SQL Model File (`.sql`)

models/marts/dim_products.sql


_10WITH input AS (
_10    SELECT
_10        product_id,
_10        product_name,
_10        product_category,
_10        -- other attributes...
_10    FROM ...
_10)
_10
_10SELECT * FROM input

Contract Configuration (`.yml`)

models/marts/products.yml


_14models:
_14  - name: dim_products
_14    config:
_14      contract:
_14        enforced: true
_14    columns:
_14      - name: product_id
_14        data_type: int
_14        constraints:
_14          type: not_null
_14      - name: product_name
_14        data_type: string
_14      - name: product_category
_14        data_type: string

In this setup, the contract ensures that the dim_products model outputs a dataset with product_id as an integer and product_name and product_category as strings.

How dbt Handles Contracts

During the build process, dbt performs a “preflight” check to ensure the SQL query of the model returns a dataset matching the contract's column names and data types. It also adjusts the Data Definition Language (DDL) statements sent to the data platform to include these specifications, enforcing these contracts during the model's creation or update.

Benefits of Using Contracts

Using contracts in dbt models helps to:

Ensure consistency across the data models, especially important in production environments where multiple processes depend on the structured output.
Reduce errors caused by unexpected changes in data models, providing a stable foundation for downstream analytics and applications.
Facilitate collaboration by making the model outputs predictable and well-documented, thus improving the reliability and usability of data assets across teams.

WAP Pattern in Y42: Data Contracts Out Of The Box

Y42 (opens in a new tab) integrates the Write-Audit-Publish (WAP) pattern by default across all data pipelines, applying all data and schema checks in an isolated environment to ensure data integrity before production deployment. This process allows changes to be tested against the latest production data in a controlled setting, and ensures only validated builds are released. This setup minimizes manual configurations and reduces the risk of data inconsistencies and errors in production.

The Write-Audit-Publish (WAP) pattern.

For a detailed understanding of how Y42 automates data quality control using the WAP pattern and how it handles code and data changes in sync, read our blog post on how Y42 brings the WAP pattern (opens in a new tab) out of the box across all data pipelines with no additional configuration.

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started