dbt
dbt-expectations Tests

dbt expectations: Advanced Data Quality Testing

The dbt-expectations (opens in a new tab) package extends the core functionality of dbt by providing a comprehensive set of data quality tests. Inspired by the Great Expectations framework, these tests help you enforce data integrity and validate assumptions about your data.

What is dbt Expectations?

dbt-expectations is a package designed to bring the power of Great Expectations-like data quality testing to your dbt models. It includes a variety of tests that allow you to verify the consistency, accuracy, and validity of your data.

How to Install dbt Expectations

To use dbt-expectations in your dbt project, you need to add it as a dependency in your packages.yml file and then install it using the dbt deps command.

  1. Add to packages.yml:
packages.yml
packages:
- package: calogica/dbt_expectations
version: 0.10.3
  1. Install the package:
dbt deps

Commonly Used Tests and Examples

Here are some commonly used tests from the dbt-expectations package, along with examples of how to use them in your dbt models.

1. expect_table_row_count_to_be_between

This test checks if the number of rows in a table is within a specified range.

models: # or seeds:
- name: orders
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 100 # (Optional)
max_value: 1000 # (Optional)
# group_by: [group_id, other_group_id, ...] # (Optional)
# row_condition: "id is not null" # (Optional)
# strictly: false # (Optional. Adds an 'or equal to' to the comparison operator for min/max)
When to use: Ideal for ensuring that your table size remains within expected boundaries, helping to detect issues like missing data or unexpected data growth.

2. expect_table_row_count_to_equal_other_table

This test validates that the number of rows in one table matches the row count of another table.

models: # or seeds:
- name: orders
tests:
- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref('expected_orders')
# group_by: [col1, col2] # (Optional)
# compare_group_by: [col1, col2] # (Optional)
# factor: 1 # (Optional)
# row_condition: "id is not null" # (Optional)
# compare_row_condition: "id is not null" # (Optional)
When to use: Useful for ensuring consistency between related tables, such as verifying that a table loaded from a source has the same number of rows as the source itself.

3. expect_column_values_to_match_regex

This test checks if the values in a column match a specified regular expression pattern.

models:
- name: users
columns:
- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
# row_condition: "id is not null" # (Optional)
# is_raw: True # (Optional)
# flags: i # (Optional)
When to use: Ideal for validating the format of email addresses, phone numbers, or other string patterns that should conform to a specific regex.

4. expect_column_values_to_match_regex_list

This test ensures that the values in a column match any of the regular expressions provided in a list.

models:
- name: users
columns:
- name: status
tests:
- dbt_expectations.expect_column_values_to_match_regex_list:
regex_list:
- '^active$'
- '^inactive$'
- '^pending$'
# match_on: any # (Optional. Default is 'any', which applies an 'OR' for each regex. If 'all', it applies an 'AND' for each regex.)
# row_condition: "id is not null" # (Optional)
# is_raw: True # (Optional)
# flags: i # (Optional)
When to use: Useful for validating that column values conform to multiple acceptable patterns, such as different status codes.

5. expect_column_mean_to_be_between

This test checks if the mean (average) of the values in a column falls within a specified range.

models:
- name: orders
columns:
- name: total_amount
tests:
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 50
max_value: 500
# group_by: [group_id, other_group_id, ...] # (Optional)
# row_condition: "id is not null" # (Optional)
# strictly: false # (Optional. Default is 'false'. Adds an 'or equal to' to the comparison operator for min/max)
When to use: Ideal for ensuring that the average of numeric values, such as transaction amounts, stays within expected limits.
Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started