dbt expectations: Advanced Data Quality Testing

The dbt-expectations (opens in a new tab) package extends the core functionality of dbt by providing a comprehensive set of data quality tests. Inspired by the Great Expectations framework, these tests help you enforce data integrity and validate assumptions about your data.

What is dbt Expectations?

dbt-expectations is a package designed to bring the power of Great Expectations-like data quality testing to your dbt models. It includes a variety of tests that allow you to verify the consistency, accuracy, and validity of your data.

How to Install dbt Expectations

To use dbt-expectations in your dbt project, you need to add it as a dependency in your packages.yml file and then install it using the dbt deps command.

Add to packages.yml:

packages.yml


_10packages:
_10  - package: calogica/dbt_expectations
_10    version: 0.10.3

Install the package:


_10dbt deps

Commonly Used Tests and Examples

Here are some commonly used tests from the dbt-expectations package, along with examples of how to use them in your dbt models.

1. expect_table_row_count_to_be_between

This test checks if the number of rows in a table is within a specified range.


_10models: # or seeds:
_10  - name: orders
_10    tests:
_10      - dbt_expectations.expect_table_row_count_to_be_between:
_10          min_value: 100 # (Optional)
_10          max_value: 1000 # (Optional)
_10          #   group_by: [group_id, other_group_id, ...] # (Optional)
_10          #   row_condition: "id is not null" # (Optional)
_10          #   strictly: false # (Optional. Adds an 'or equal to' to the comparison operator for min/max)

2. expect_table_row_count_to_equal_other_table

This test validates that the number of rows in one table matches the row count of another table.


_10models: # or seeds:
_10  - name: orders
_10    tests:
_10      - dbt_expectations.expect_table_row_count_to_equal_other_table:
_10          compare_model: ref('expected_orders')
_10          #   group_by: [col1, col2] # (Optional)
_10          #   compare_group_by: [col1, col2] # (Optional)
_10          #   factor: 1 # (Optional)
_10          #   row_condition: "id is not null" # (Optional)
_10          #   compare_row_condition: "id is not null" # (Optional)

3. expect_column_values_to_match_regex

This test checks if the values in a column match a specified regular expression pattern.


_10models:
_10  - name: users
_10    columns:
_10      - name: email
_10        tests:
_10          - dbt_expectations.expect_column_values_to_match_regex:
_10              regex: '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
_10              #   row_condition: "id is not null" # (Optional)
_10              #   is_raw: True # (Optional)
_10              #   flags: i # (Optional)

4. expect_column_values_to_match_regex_list

This test ensures that the values in a column match any of the regular expressions provided in a list.


_14models:
_14  - name: users
_14    columns:
_14      - name: status
_14        tests:
_14          - dbt_expectations.expect_column_values_to_match_regex_list:
_14              regex_list:
_14                - '^active$'
_14                - '^inactive$'
_14                - '^pending$'
_14              #   match_on: any # (Optional. Default is 'any', which applies an 'OR' for each regex. If 'all', it applies an 'AND' for each regex.)
_14              #   row_condition: "id is not null" # (Optional)
_14              #   is_raw: True # (Optional)
_14              #   flags: i # (Optional)

5. expect_column_mean_to_be_between

This test checks if the mean (average) of the values in a column falls within a specified range.


_11models:
_11  - name: orders
_11    columns:
_11      - name: total_amount
_11        tests:
_11          - dbt_expectations.expect_column_mean_to_be_between:
_11              min_value: 50
_11              max_value: 500
_11              # group_by: [group_id, other_group_id, ...] # (Optional)
_11              # row_condition: "id is not null" # (Optional)
_11              # strictly: false # (Optional. Default is 'false'. Adds an 'or equal to' to the comparison operator for min/max)

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started