Snapshots
Add new snapshot.
Overview
Analysts often need to examine historical data. While many source systems provide this capability, some do not offer ways to review table history. Snapshots enable you to handle Slowly Changing Dimensions type 2 (opens in a new tab) over mutable source tables, allowing users to track records that change over time directly in Y42.
Example: Tracking customer addresses
Imagine you have a customers
table with fields like address
. People move, and when they do, their addresses in the table change. Snapshots help you retain this historical information for better data analysis.
id | name | address | city |
---|---|---|---|
1 | Henry Having | Koenigsplatz 10 | Cologne |
2 | Garry Giving | Prinzenpark 322 | Munich |
If Henry changes his address to 1 Hacker Road
:
id | name | address | city |
---|---|---|---|
1 | Henry Having | 1 Hacker Road | Palo Alto |
2 | Garry Giving | Prinzenpark 322 | Munich |
We will lose the information about his previous address, making it challenging to analyze data and answer questions such as the average shipping time to a specific country or city.
Snapshots address this issue by preserving these changes in the table:
id | name | address | city | updated_at | valid_from | valid_to |
---|---|---|---|---|---|---|
1 | Henry Having | Koenigsplatz 10 | Cologne | 2023-10-01 | 2023-10-01 | 2023-10-27 |
2 | Garry Giving | Prinzenpark 322 | Munich | 2023-10-01 | 2023-10-01 | null |
1 | Henry Having | 1 Hacker Road | Palo Alto | 2023-10-27 | 2023-10-27 | null |
How to create snapshots
List view
- Create a new asset: Click on the
+
sign at the top of the left sidebar to create a new asset. - Select asset type: From the options presented, choose
Snapshot
. - Name your snapshot: Provide a name for the snapshot and select
Create
.
Create Snapshot in List view.
- Configure snapshot settings: Begin by setting up the necessary configurations for snapshots via our user interface:
Configure snapshot.
- Target Database and Target Schema: Mandatory fields that inform Y42 where to store the snapshot.
- Unique_key: Helps Y42 identify unique rows in the source table to track changes.
- Strategy: Two strategies exist:
timestamp
andcheck
. Each runs a slightly different query against the DWH. Learn more about selecting the appropriate strategy. - Updated At / Check Cols: These settings vary based on the strategy selected.
- Invalidate hard deletes: This is an opt-in feature that allows for invalidating hard-deleted records during snapshotting. It is disabled by default.
Code View
A complete snapshot query in Code view looks like this:
Timestamp and Check strategies
Timestamp strategy (Recommended)
The timestamp strategy relies on an updated_at
field to identify row changes. If the updated_at
column for a row has a timestamp more recent than the last snapshot, Y42 will invalidate the old record and store the new one. If the timestamps are the same, dbt will not make any changes.
Required configurations for Timestamp Strategy
Config | Description | Example |
---|---|---|
updated_at | A column that indicates when the source row was last updated | updated_at |
Example usage:
Check strategy
The check strategy is suitable for tables lacking a reliable updated_at column. It compares a specified list of columns between the current and historical records. If any column has changed, dbt will invalidate the old record and capture the new one. If the columns remain the same, no action is taken.
Required configurations for Check Strategy:
Config | Description | Example |
---|---|---|
check_cols | A list of columns to monitor for changes, or 'all' to check all columns | ["name", "email"] or 'ALL' |
Example usage:
Query best practices
For optimal performance, start with the simplest query. If you experience performance or cost issues, consider selecting specific columns.
Running snapshot jobs
Snapshots can only be triggered in the default branches you've configured (e.g., main). This is to prevent snapshot runs in development branches from accidentally overwriting your main snapshot table. Each job writes into the same DWH table, unless you modify the configs.