dbt
Write Data Tests

dbt tests: Types, Configuration, and Execution

Data tests in dbt are essential for ensuring data quality and integrity throughout your data modeling process. They allow you to enforce specific conditions and constraints on your data models, sources, seeds, and snapshots, providing confidence in the reliability of your data outputs.

This page will cover the different types of data tests available in dbt, how to configure them, and how to execute these tests, along with an exploration of how Y42 integrates advanced anomaly detection tests to elevate data testing practices.

Types of Data Tests in dbt

dbt supports various types of data tests, each designed to check for specific data conditions:

1. Built-in Column-Level Tests

These tests are pre-defined within dbt and do not require additional installations. They are designed to check for common data issues directly within your models’ columns:

  • not_null: Ensures no null values are present in a specified column, helping maintain data completeness.
  • unique: Verifies that all entries in a specified column are unique, ensuring no duplicates are present.
  • relationships: Validates referential integrity by checking that foreign keys in one column correctly reference primary keys in another, crucial for relational data integrity.
  • accepted_values: Confirms that all entries in a column fall within a defined set of values, useful for enforcing business rules on data entries.

2. Package Available Tests

dbt supports the extension of its core functionality with packages that include additional tests. These packages must be installed before the tests can be used. Package tests often provide more specialized checks that are not covered by dbt’s built-in tests:

Example Package: dbt_utils (opens in a new tab)

  • This package offers a variety of additional tests, such as equality, expression_is_true, and recency, among others.
  • To use these tests, you first need to add the package to your packages.yml file and run dbt deps to install it.

_10
packages:
_10
- package: dbt-labs/dbt_utils
_10
version: 1.2.0

After installation, you can apply these tests similarly to built-in tests, referencing them in your model’s configuration file:


_10
models:
_10
- name: your_model_name
_10
columns:
_10
- name: your_column_name
_10
tests:
_10
- dbt_utils.expression_is_true:
_10
expression: "your_column_name > 0"
_10
error_if: "Fail"

3. Custom Tests

Custom tests are SQL queries that you write to address specific data validation needs that are unique to your business or data setup. These tests are flexible and can be applied across columns or even entire tables:

Example of a Custom Test:

  • Suppose you want to ensure that the sum of all transactions for a user does not exceed a certain limit. You could write a SQL query encapsulated in a test to perform this check.

_10
-- tests/transaction_limit_test.sql
_10
select user_id, sum(transaction_amount) as total_spent
_10
from {{ ref('transactions') }}
_10
group by user_id
_10
having total_spent > 10000 -- Assuming the limit is 10,000

This test would then be configured in your dbt project like so


_10
models:
_10
- name: transactions
_10
tests:
_10
- transaction_limit_test

How to Run dbt Tests

To execute configured data tests, use the dbt test command. You can run all tests or specify particular tests using their custom names:


_10
# Run all tests
_10
$ dbt test
_10
_10
# Run a specific test
_10
$ dbt test --select valid_order_statuses

These commands will execute the tests and report any failures, helping you to ensure that your data models meet all specified conditions.

Advanced Testing with Y42: Anomaly Detection

Y42 (opens in a new tab) extends dbt’s testing capabilities by incorporating anomaly detection (opens in a new tab), which automates the monitoring of data quality issues such as unexpected changes in data volume, freshness, or deviations in key metrics.

Image source: Elementary data.

Image source: Elementary data.

Example of configuring anomaly detection tests (opens in a new tab) in Y42:

models/orders.yml

_18
version: 2
_18
_18
sources:
_18
- name: < model name >
_18
# other properties
_18
tables:
_18
- name: orders
_18
config:
_18
# other properties
_18
elementary:
_18
timestamp_column: < timestamp column >
_18
tests:
_18
- elementary.all_columns_anomalies:
_18
column_anomalies: < specific monitors, all if null >
_18
where_expression: < sql expression >
_18
time_bucket: # Daily by default
_18
period: < time period >
_18
count: < number of periods >

Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started