dbt expectations: Advanced Data Quality Testing
The dbt-expectations (opens in a new tab) package extends the core functionality of dbt by providing a comprehensive set of data quality tests. Inspired by the Great Expectations framework, these tests help you enforce data integrity and validate assumptions about your data.
What is dbt Expectations?
dbt-expectations
is a package designed to bring the power of Great Expectations-like data quality testing to your dbt models. It includes a variety of tests that allow you to verify the consistency, accuracy, and validity of your data.
How to Install dbt Expectations
To use dbt-expectations
in your dbt project, you need to add it as a dependency in your packages.yml
file and then install it using the dbt deps
command.
- Add to packages.yml:
- Install the package:
_10dbt deps
Commonly Used Tests and Examples
Here are some commonly used tests from the dbt-expectations package, along with examples of how to use them in your dbt models.
1. expect_table_row_count_to_be_between
This test checks if the number of rows in a table is within a specified range.
_10models: # or seeds:_10 - name: orders_10 tests:_10 - dbt_expectations.expect_table_row_count_to_be_between:_10 min_value: 100 # (Optional)_10 max_value: 1000 # (Optional)_10 # group_by: [group_id, other_group_id, ...] # (Optional)_10 # row_condition: "id is not null" # (Optional)_10 # strictly: false # (Optional. Adds an 'or equal to' to the comparison operator for min/max)
2. expect_table_row_count_to_equal_other_table
This test validates that the number of rows in one table matches the row count of another table.
_10models: # or seeds:_10 - name: orders_10 tests:_10 - dbt_expectations.expect_table_row_count_to_equal_other_table:_10 compare_model: ref('expected_orders')_10 # group_by: [col1, col2] # (Optional)_10 # compare_group_by: [col1, col2] # (Optional)_10 # factor: 1 # (Optional)_10 # row_condition: "id is not null" # (Optional)_10 # compare_row_condition: "id is not null" # (Optional)
3. expect_column_values_to_match_regex
This test checks if the values in a column match a specified regular expression pattern.
_10models:_10 - name: users_10 columns:_10 - name: email_10 tests:_10 - dbt_expectations.expect_column_values_to_match_regex:_10 regex: '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'_10 # row_condition: "id is not null" # (Optional)_10 # is_raw: True # (Optional)_10 # flags: i # (Optional)
4. expect_column_values_to_match_regex_list
This test ensures that the values in a column match any of the regular expressions provided in a list.
_14models:_14 - name: users_14 columns:_14 - name: status_14 tests:_14 - dbt_expectations.expect_column_values_to_match_regex_list:_14 regex_list:_14 - '^active$'_14 - '^inactive$'_14 - '^pending$'_14 # match_on: any # (Optional. Default is 'any', which applies an 'OR' for each regex. If 'all', it applies an 'AND' for each regex.)_14 # row_condition: "id is not null" # (Optional)_14 # is_raw: True # (Optional)_14 # flags: i # (Optional)
5. expect_column_mean_to_be_between
This test checks if the mean (average) of the values in a column falls within a specified range.
_11models:_11 - name: orders_11 columns:_11 - name: total_amount_11 tests:_11 - dbt_expectations.expect_column_mean_to_be_between:_11 min_value: 50_11 max_value: 500_11 # group_by: [group_id, other_group_id, ...] # (Optional)_11 # row_condition: "id is not null" # (Optional)_11 # strictly: false # (Optional. Default is 'false'. Adds an 'or equal to' to the comparison operator for min/max)
Manage Sources and dbt Models in one place
Build end-to-end pipelines using a single framework.
Get Started