dbt tests: Types, Configuration, and Execution
Data tests in dbt are essential for ensuring data quality and integrity throughout your data modeling process. They allow you to enforce specific conditions and constraints on your data models
, sources
, seeds
, and snapshots
, providing confidence in the reliability of your data outputs.
This page will cover the different types of data tests available in dbt, how to configure them, and how to execute these tests, along with an exploration of how Y42 integrates advanced anomaly detection tests to elevate data testing practices.
Types of Data Tests in dbt
dbt supports various types of data tests, each designed to check for specific data conditions:
1. Built-in Column-Level Tests
These tests are pre-defined within dbt and do not require additional installations. They are designed to check for common data issues directly within your models’ columns:
not_null
: Ensures no null values are present in a specified column, helping maintain data completeness.unique
: Verifies that all entries in a specified column are unique, ensuring no duplicates are present.relationships
: Validates referential integrity by checking that foreign keys in one column correctly reference primary keys in another, crucial for relational data integrity.accepted_values
: Confirms that all entries in a column fall within a defined set of values, useful for enforcing business rules on data entries.
2. Package Available Tests
dbt supports the extension of its core functionality with packages that include additional tests. These packages must be installed before the tests can be used. Package tests often provide more specialized checks that are not covered by dbt’s built-in tests:
Example Package: dbt_utils (opens in a new tab)
- This package offers a variety of additional tests, such as
equality
,expression_is_true
, andrecency
, among others. - To use these tests, you first need to add the package to your
packages.yml
file and rundbt deps
to install it.
_10packages:_10 - package: dbt-labs/dbt_utils_10 version: 1.2.0
After installation, you can apply these tests similarly to built-in tests, referencing them in your model’s configuration file:
_10models:_10 - name: your_model_name_10 columns:_10 - name: your_column_name_10 tests:_10 - dbt_utils.expression_is_true:_10 expression: "your_column_name > 0"_10 error_if: "Fail"
3. Custom Tests
Custom tests are SQL queries that you write to address specific data validation needs that are unique to your business or data setup. These tests are flexible and can be applied across columns or even entire tables:
Example of a Custom Test:
- Suppose you want to ensure that the sum of all transactions for a user does not exceed a certain limit. You could write a SQL query encapsulated in a test to perform this check.
_10-- tests/transaction_limit_test.sql_10select user_id, sum(transaction_amount) as total_spent_10from {{ ref('transactions') }}_10group by user_id_10having total_spent > 10000 -- Assuming the limit is 10,000
This test would then be configured in your dbt project like so
_10models:_10 - name: transactions_10 tests:_10 - transaction_limit_test
How to Run dbt Tests
To execute configured data tests, use the dbt test
command. You can run all tests or specify particular tests using their custom names:
_10# Run all tests_10$ dbt test_10_10# Run a specific test_10$ dbt test --select valid_order_statuses
These commands will execute the tests and report any failures, helping you to ensure that your data models meet all specified conditions.
Advanced Testing with Y42: Anomaly Detection
Y42 (opens in a new tab) extends dbt’s testing capabilities by incorporating anomaly detection (opens in a new tab), which automates the monitoring of data quality issues such as unexpected changes in data volume, freshness, or deviations in key metrics.
Image source: Elementary data.
Example of configuring anomaly detection tests (opens in a new tab) in Y42:
Manage Sources and dbt Models in one place
Build end-to-end pipelines using a single framework.
Get Started