Snowflake FLATTEN: How to Flatten JSON Data
The FLATTEN
function in Snowflake expands nested data, such as semi-structured data, into a tabular format that can be more easily manipulated with SQL. It 'explodes' complex data types like VARIANT, OBJECT, or ARRAY, turning one row into multiple rows by expanding nested arrays or objects. This transformation makes semi-structured data more accessible for querying and analysis.
Key Concepts
- Data Transformation:
FLATTEN
converts complex data structures into individual rows, simplifying data analysis. - Flexible Usage: It allows specific targeting of objects, arrays, or both and can be recursive to handle deeply nested structures.
- Query Enhancement: By producing a lateral view,
FLATTEN
enables detailed queries on elements within complex data types.
Common Use Cases
- Log Analysis: Extract specific elements from JSON-formatted log files for error tracking or performance metrics.
- API Data Management: Transform nested API response data into a tabular format for easier querying and visualization.
- Preprocessing for Analytics: Flatten data structures before analysis or machine learning model ingestion.
Syntax
_10FLATTEN(_10 INPUT => <expr>, _10 [PATH => <constant_expr>],_10 [OUTER => TRUE | FALSE],_10 [RECURSIVE => TRUE | FALSE],_10 [MODE => 'OBJECT' | 'ARRAY' | 'BOTH']_10)
- INPUT: Specifies the expression to be flattened, which must be a
VARIANT
,OBJECT
, orARRAY
. - PATH: Optional path to the element within the
VARIANT
data structure to be flattened. - OUTER: If set to
TRUE
, generates a row even if there are no entries to expand. - RECURSIVE: If
TRUE
, expands all sub-elements recursively. - MODE: Specifies whether to flatten objects, arrays, or both.
Examples
Flatten JSON data
Imagine a scenario from “The Hitchhiker’s Guide to the Galaxy” where you store information about interstellar journeys in a table, with details about each journey stored in a JSON column.
Our goal is to transform this semi-structured data into a tabular format for easy querying:
Input
_18CREATE_18OR REPLACE TABLE interstellar_journeys AS_18SELECT_18 column1 AS journey_id,_18 parse_json (column2) AS journey_data_18FROM VALUES_18 ( 1,_18 '{_18 "voyage": "Heart of Gold",_18 "stops": [{"planet": "Earth"}, {"planet": "Vogon Constructor Fleet"}, {"planet": "Magrathea"}]_18 }'_18 ),_18 ( 2,_18 '{_18 "voyage": "Starship Titanic",_18 "stops": [{"planet": "Barnards Star"}, {"planet": "Betelgeuse"}]_18 }'_18 );
Table: interstellar_journeys
journey_id | journey_data |
---|---|
1 | { "stops": [ { "planet": "Earth" }, { "planet": "Vogon Constructor Fleet" }, { "planet": "Magrathea" } ], "voyage": "Heart of Gold" } |
2 | { "stops": [ { "planet": "Barnards Star" }, { "planet": "Betelgeuse" } ], "voyage": "Starship Titanic" } |
Snowflake SQL Query
_10SELECT _10 journey_id,_10 f.value:planet::varchar AS planet_visited_10FROM interstellar_journeys,_10 LATERAL FLATTEN(input => journey_data:stops) f;
Output
journey_id | planet_visited |
---|---|
1 | Earth |
1 | Vogon Constructor Fleet |
1 | Magrathea |
2 | Barnards Star |
2 | Betelgeuse |
Explanation
This query retrieves the journey_id
and the planets visited from the interstellar_journeys
table.
The FLATTEN
function is used to expand the stops
array from the journey_data
JSON column into individual rows, making each planet visit accessible.
For each row in interstellar_journeys
, the FLATTEN function creates a new row for every planet in the stops array, extracting the planet’s name with f.value:planet as planet_visited
. This process allows detailed analysis of each stop on the interstellar journeys.