dbt Snapshot: Capture Historical Data Changes
Maintaining a historical record of data changes over time is key for trend analysis, auditing, and reporting. dbt Snapshots offer a method for capturing and storing the state of your datasets at specific intervals similarly to the Slowly Changing Dimension Type 2 method (opens in a new tab).
What are dbt Snapshots?
dbt Snapshots are models that allow you to track and store changes to a dataset over time. By capturing the state of a dataset at defined intervals, snapshots enable you to query historical data.
How dbt Snapshots Work
To implement dbt Snapshots, you define snapshot configurations within your dbt project. These configurations determine how dbt captures changes in your data. The primary strategies for detecting changes include:
- Timestamp-based Snapshots: Track changes based on a timestamp column in your data.
- Checksum-based Snapshots: Detect changes by comparing checksums of record contents.
Example: Tracking Employee Salaries
Imagine you have an employees
table with fields like salary
. Employee salaries may change over time due to promotions or adjustments. Snapshots help you retain this historical information for better data analysis.
id | name | position | salary |
---|---|---|---|
1 | Alice Smith | Developer | 60000 |
2 | Bob Johnson | Manager | 80000 |
If Alice gets a promotion and her salary changes to 70000
:
id | name | position | salary |
---|---|---|---|
1 | Alice Smith | Senior Developer | 70000 |
2 | Bob Johnson | Manager | 80000 |
We will lose the information about her previous salary, making it challenging to analyze data and answer questions such as the average salary changes over time or salary trends for specific positions.
Snapshots address this issue by preserving these changes in the table:
id | name | position | salary | updated_at | valid_from | valid_to |
---|---|---|---|---|---|---|
1 | Alice Smith | Developer | 60000 | 2023-01-01 | 2023-01-01 | 2023-06-01 |
2 | Bob Johnson | Manager | 80000 | 2023-01-01 | 2023-01-01 | null |
1 | Alice Smith | Senior Developer | 70000 | 2023-06-01 | 2023-06-01 | null |
Snapshot Configuration for Tracking Employee Salaries
To capture these changes, you can configure a snapshot in dbt as follows:
_17{% snapshot employee_salaries_snapshot %}_17_17{% set unique_key = 'id' %}_17{% set strategy = 'check' %}_17{% set check_cols = ['position', 'salary'] %}_17_17{{ config(_17 target_schema='snapshots',_17 unique_key=unique_key,_17 strategy=strategy,_17 check_cols=check_cols,_17 updated_at='updated_at'_17) }}_17_17SELECT * FROM {{ ref('employees') }}_17_17{% endsnapshot %}
This configuration captures snapshots of the employees table, tracking changes in the position
and salary
columns.
Running dbt Snapshots
To execute the snapshot and capture changes, use the dbt snapshot command:
_10dbt snapshot
This command processes all snapshot configurations in your project, preserving the historical state of the employees
table.
Y42 Enhanced Snapshot Capabilities
Y42 (opens in a new tab) enhances the capabilities of dbt Snapshots by offering an intuitive user interface for snapshot configuration, making it accessible to users who prefer a graphical approach (opens in a new tab).
Configure snapshot.
Key Features of Y42 Snapshots
- UI-based Configuration: Set up and manage snapshots without writing code, using an intuitive interface.
- Advanced Scheduling Options: Configure and adjust the frequency and conditions for taking snapshots easily.
- Integrated Monitoring: View and monitor snapshot logs and histories directly from the Y42 platform.
Manage Sources and dbt Models in one place
Build end-to-end pipelines using a single framework.
Get Started