dbt tests: Types, Configuration, and Execution

Data tests in dbt are essential for ensuring data quality and integrity throughout your data modeling process. They allow you to enforce specific conditions and constraints on your data models, sources, seeds, and snapshots, providing confidence in the reliability of your data outputs.

This page will cover the different types of data tests available in dbt, how to configure them, and how to execute these tests, along with an exploration of how Y42 integrates advanced anomaly detection tests to elevate data testing practices.

Types of Data Tests in dbt

dbt supports various types of data tests, each designed to check for specific data conditions:

1. Built-in Column-Level Tests

These tests are pre-defined within dbt and do not require additional installations. They are designed to check for common data issues directly within your models’ columns:

not_null: Ensures no null values are present in a specified column, helping maintain data completeness.
unique: Verifies that all entries in a specified column are unique, ensuring no duplicates are present.
relationships: Validates referential integrity by checking that foreign keys in one column correctly reference primary keys in another, crucial for relational data integrity.
accepted_values: Confirms that all entries in a column fall within a defined set of values, useful for enforcing business rules on data entries.

2. Package Available Tests

dbt supports the extension of its core functionality with packages that include additional tests. These packages must be installed before the tests can be used. Package tests often provide more specialized checks that are not covered by dbt’s built-in tests:

Example Package: dbt_utils (opens in a new tab)

This package offers a variety of additional tests, such as equality, expression_is_true, and recency, among others.
To use these tests, you first need to add the package to your packages.yml file and run dbt deps to install it.


_10packages:
_10  - package: dbt-labs/dbt_utils
_10    version: 1.2.0

After installation, you can apply these tests similarly to built-in tests, referencing them in your model’s configuration file:


_10models:
_10  - name: your_model_name
_10    columns:
_10      - name: your_column_name
_10        tests:
_10          - dbt_utils.expression_is_true:
_10              expression: "your_column_name > 0"
_10              error_if: "Fail"

3. Custom Tests

Custom tests are SQL queries that you write to address specific data validation needs that are unique to your business or data setup. These tests are flexible and can be applied across columns or even entire tables:

Example of a Custom Test:

Suppose you want to ensure that the sum of all transactions for a user does not exceed a certain limit. You could write a SQL query encapsulated in a test to perform this check.


_10-- tests/transaction_limit_test.sql
_10select user_id, sum(transaction_amount) as total_spent
_10from {{ ref('transactions') }}
_10group by user_id
_10having total_spent > 10000  -- Assuming the limit is 10,000

This test would then be configured in your dbt project like so


_10models:
_10  - name: transactions
_10    tests:
_10      - transaction_limit_test

How to Run dbt Tests

To execute configured data tests, use the dbt test command. You can run all tests or specify particular tests using their custom names:


_10# Run all tests
_10$ dbt test
_10
_10# Run a specific test
_10$ dbt test --select valid_order_statuses

These commands will execute the tests and report any failures, helping you to ensure that your data models meet all specified conditions.

Advanced Testing with Y42: Anomaly Detection

Y42 (opens in a new tab) extends dbt’s testing capabilities by incorporating anomaly detection (opens in a new tab), which automates the monitoring of data quality issues such as unexpected changes in data volume, freshness, or deviations in key metrics.

Image source: Elementary data.

Example of configuring anomaly detection tests (opens in a new tab) in Y42:

models/orders.yml


_18version: 2
_18
_18sources:
_18  - name: < model name >
_18    # other properties
_18    tables:
_18      - name: orders
_18        config:
_18          # other properties
_18          elementary:
_18            timestamp_column:  < timestamp column >
_18        tests:
_18          - elementary.all_columns_anomalies:
_18              column_anomalies: < specific monitors, all if null >
_18              where_expression: < sql expression >
_18              time_bucket: # Daily by default
_18                period: < time period >
_18                count: < number of periods >

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started