dbt
Snapshot Historical Data Changes

dbt Snapshot: Capture Historical Data Changes

Maintaining a historical record of data changes over time is key for trend analysis, auditing, and reporting. dbt Snapshots offer a method for capturing and storing the state of your datasets at specific intervals similarly to the Slowly Changing Dimension Type 2 method (opens in a new tab).

What are dbt Snapshots?

dbt Snapshots are models that allow you to track and store changes to a dataset over time. By capturing the state of a dataset at defined intervals, snapshots enable you to query historical data.

How dbt Snapshots Work

To implement dbt Snapshots, you define snapshot configurations within your dbt project. These configurations determine how dbt captures changes in your data. The primary strategies for detecting changes include:

  • Timestamp-based Snapshots: Track changes based on a timestamp column in your data.
  • Checksum-based Snapshots: Detect changes by comparing checksums of record contents.

Example: Tracking Employee Salaries

Imagine you have an employees table with fields like salary. Employee salaries may change over time due to promotions or adjustments. Snapshots help you retain this historical information for better data analysis.

idnamepositionsalary
1Alice SmithDeveloper60000
2Bob JohnsonManager80000

If Alice gets a promotion and her salary changes to 70000:

idnamepositionsalary
1Alice SmithSenior Developer70000
2Bob JohnsonManager80000

We will lose the information about her previous salary, making it challenging to analyze data and answer questions such as the average salary changes over time or salary trends for specific positions.

Snapshots address this issue by preserving these changes in the table:

idnamepositionsalaryupdated_atvalid_fromvalid_to
1Alice SmithDeveloper600002023-01-012023-01-012023-06-01
2Bob JohnsonManager800002023-01-012023-01-01null
1Alice SmithSenior Developer700002023-06-012023-06-01null

Snapshot Configuration for Tracking Employee Salaries

To capture these changes, you can configure a snapshot in dbt as follows:


_17
{% snapshot employee_salaries_snapshot %}
_17
_17
{% set unique_key = 'id' %}
_17
{% set strategy = 'check' %}
_17
{% set check_cols = ['position', 'salary'] %}
_17
_17
{{ config(
_17
target_schema='snapshots',
_17
unique_key=unique_key,
_17
strategy=strategy,
_17
check_cols=check_cols,
_17
updated_at='updated_at'
_17
) }}
_17
_17
SELECT * FROM {{ ref('employees') }}
_17
_17
{% endsnapshot %}

This configuration captures snapshots of the employees table, tracking changes in the position and salary columns.

Running dbt Snapshots

To execute the snapshot and capture changes, use the dbt snapshot command:


_10
dbt snapshot

This command processes all snapshot configurations in your project, preserving the historical state of the employees table.

Y42 Enhanced Snapshot Capabilities

Y42 (opens in a new tab) enhances the capabilities of dbt Snapshots by offering an intuitive user interface for snapshot configuration, making it accessible to users who prefer a graphical approach (opens in a new tab).

Configure snapshot.

Configure snapshot.

Key Features of Y42 Snapshots

  • UI-based Configuration: Set up and manage snapshots without writing code, using an intuitive interface.
  • Advanced Scheduling Options: Configure and adjust the frequency and conditions for taking snapshots easily.
  • Integrated Monitoring: View and monitor snapshot logs and histories directly from the Y42 platform.
Y42 Lineage Mode

Manage Sources and dbt Models in one place

Build end-to-end pipelines using a single framework.

Get Started