dbt contracts: Enforce Schema Consistency
dbt contracts establish a reliable schema for your models, ensuring that any transformations you apply result in a dataset that adheres to a predefined schema. This concept of “contracts” in dbt provides a robust framework for creating dependable data models that other processes and users can rely on.
Why Use Contracts?
In dbt, defining a model is typically a SQL select statement that outlines the dataset’s structure. However, without contracts, changes in the model’s structure during development could lead to inconsistencies, especially when other systems or models depend on the current model output. By implementing a contract, you establish a set of guarantees about the model's output structure. During the build process, dbt checks that the model adheres to its contract, adding a layer of reliability and reducing the risk of downstream errors.
Defining a Contract
Suppose you have a model that organizes products information. Below is an example of how you might define a contract for such a model:
SQL Model File (.sql
)
Contract Configuration (.yml
)
In this setup, the contract ensures that the dim_products
model outputs a dataset with product_id
as an integer and product_name
and product_category
as strings.
How dbt Handles Contracts
During the build process, dbt performs a “preflight” check to ensure the SQL query of the model returns a dataset matching the contract's column names and data types. It also adjusts the Data Definition Language (DDL) statements sent to the data platform to include these specifications, enforcing these contracts during the model's creation or update.
Benefits of Using Contracts
Using contracts in dbt models helps to:
- Ensure consistency across the data models, especially important in production environments where multiple processes depend on the structured output.
- Reduce errors caused by unexpected changes in data models, providing a stable foundation for downstream analytics and applications.
- Facilitate collaboration by making the model outputs predictable and well-documented, thus improving the reliability and usability of data assets across teams.
WAP Pattern in Y42: Data Contracts Out Of The Box
Y42 (opens in a new tab) integrates the Write-Audit-Publish (WAP) pattern by default across all data pipelines, applying all data and schema checks in an isolated environment to ensure data integrity before production deployment. This process allows changes to be tested against the latest production data in a controlled setting, and ensures only validated builds are released. This setup minimizes manual configurations and reduces the risk of data inconsistencies and errors in production.
The Write-Audit-Publish (WAP) pattern.
For a detailed understanding of how Y42 automates data quality control using the WAP pattern and how it handles code and data changes in sync, read our blog post on how Y42 brings the WAP pattern (opens in a new tab) out of the box across all data pipelines with no additional configuration.
Manage Sources and dbt Models in one place
Build end-to-end pipelines using a single framework.
Get Started