Data Diff - Public Beta
Data Diff lets you compare two materializations of an asset to identify changes at the row and column level—whether they're updates, deletions, or additions. This detailed comparison helps in debugging by allowing you to see how data changed from snapshot to snapshot or in developing new features by comparing the production version with your modified version. Additionally, Data Diff plays an essential role in code reviews by providing reviewers with code and data diffs side by side.
Prerequisites
Data Diff can be accessed in three key areas within the platform:
- Asset Editor (Preview Mode)
- Build History and Pull Requests
Access Requirements
To access Data Diff, users must have the appropriate permissions:
- Admin rights at the organizational level, or
- Limited access at the organizational level combined with being the ‘Owner’ of the specific space
Supported Asset Types
Data Diff can be utilized in models and sources. Data data diff works for incremental models as long as models are materialized. Preview mode is however available only for SQL models.
How to Use Data Diff
During Development (Preview Mode)
- Make changes to the asset.
- Click on "Preview".
- Configure the data diff settings.
- Data Diff will then compare the data that will be materialized once the asset changes are built against the selected asset build. (only with the latest valid build)
When Exploring History of Changes (Build History)
- Select a successfully completed build.
- Click on "Data".
- Configure the data diff settings.
- Data Diff will compare the data materialized of the 2 selected builds.
When Reviewing a Pull Request (PR)
- Navigate to the PR.
- Go to "Asset Changes" > "Data".
- Configure the data diff settings.
- Data Diff will compare the data materialized as per the asset changes in the PR against the selected asset build.
Configuring a Data Diff
Select Two Materializations of Assets (Builds)
Begin by choosing the build of the data assets you wish to compare.
Select the Join Keys
Identify and select the table columns that are present in both build and share the same column type. These columns will serve as the join keys for the comparison.
Initiate the Comparison
Once two builds and join keys are selected, the system will perform a full outer join on the two underlying tables/views and will identify changes.
Reading Data Diff
Types of Changes Detected
Data Diff identifies two main types of changes:
- Column Changes: Detection of added, deleted, or type-changed columns.
- Row Changes: Identification of added, deleted rows, or changes in cell entries or cell types.
Searching for Data Differences in the UI
Users can navigate and analyze data differences within the Data Diff interface using several search functionalities:
- On Self-Joinable Key: Utilize sorting and filtering options to facilitate quick identification of changes.
- Search by Column Name: Pinpoint specific columns by name to focus on areas of interest or concern.
- On Diff Columns: Apply advanced filters to delve deeper into differences between columns.