Source data freshness

Source data freshness

Monitoring source data freshenss.

Overview

Source freshness tests enable you to assess the timeliness of data in source tables. These checks are valuable for verifying the health of your upstream data before running your DAGs.

Unlike dbt, where a separate command is needed to run source freshness checks, the Y42 build command automatically performs these checks.

Setting up source freshness checks

In Y42, you can configure source freshness tests using the error_after parameter within the freshness block of a source YAML file. Providing a loaded_at_field is mandatory for calculating freshness.

These settings are hierarchical, meaning a configuration set at the source level will apply to all tables within that source unless overridden.

sources/pizza_shop.yml
version: 2

sources:
- name: pizza_shop
database: raw

freshness: # default freshness
error_after: {count: 24, period: hour}

loaded_at_field: _etl_loaded_at

tables:
- name: customers # this will use the freshness defined above

- name: orders # this will use the more specific freshness below
freshness: # make this a little more strict
error_after: {count: 12, period: hour}