Anomaly detection tests - Public Beta
Y42 offers tests for detecting data quality issues. Y42 data tests are set up and run as native tests within your dbt assets.
Y42 tests complement dbt built-in tests, package tests (like dbt-expectations), and custom tests.
Y42 integrates Elementary (opens in a new tab), an open-source anomaly detection package, to monitor data quality. It focuses on specific metrics such as row count, null rate, and average value, comparing recent measurements with historical data. This comparison helps to identify significant changes and deviations, likely indicating data reliability issues.
When a test is executed, your data is divided into time buckets according to the time_bucket
field, and constrained by the training_period
variable. The test then compares a specific metric (e.g., row count) from buckets within the detection_period
against the metric from all prior buckets during the training_period
. If anomalies are detected within the detection_period
timeframe, the test fails.
Image source: Elementary data.
Detection method
The method uses the Z-score to identify anomalies. It calculates how far a data point is from the mean in terms of standard deviations. The Z-score thresholds are:
- ~68% of data points fall within a Z-score of 1 or less.
- ~95% of data points fall within a Z-score of 2 or less.
- ~99.7% of data points fall within a Z-score of 3 or less.
A Z-score above 3 indicates an outlier. This threshold is adjustable using the anomaly_sensitivity
variable.
Supported tests
Anomaly detection test | Supported? |
---|---|
all_columns_anomalies | ✅ |