Snowflake FLATTEN: How to Flatten JSON Data

The FLATTEN function in Snowflake expands nested data, such as semi-structured data, into a tabular format that can be more easily manipulated with SQL. It 'explodes' complex data types like VARIANT, OBJECT, or ARRAY, turning one row into multiple rows by expanding nested arrays or objects. This transformation makes semi-structured data more accessible for querying and analysis.

Key Concepts

Data Transformation: FLATTEN converts complex data structures into individual rows, simplifying data analysis.
Flexible Usage: It allows specific targeting of objects, arrays, or both and can be recursive to handle deeply nested structures.
Query Enhancement: By producing a lateral view, FLATTEN enables detailed queries on elements within complex data types.

Common Use Cases

Log Analysis: Extract specific elements from JSON-formatted log files for error tracking or performance metrics.
API Data Management: Transform nested API response data into a tabular format for easier querying and visualization.
Preprocessing for Analytics: Flatten data structures before analysis or machine learning model ingestion.

Syntax


_10FLATTEN(
_10  INPUT => <expr>, 
_10  [PATH => <constant_expr>],
_10  [OUTER => TRUE | FALSE],
_10  [RECURSIVE => TRUE | FALSE],
_10  [MODE => 'OBJECT' | 'ARRAY' | 'BOTH']
_10)

INPUT: Specifies the expression to be flattened, which must be a VARIANT, OBJECT, or ARRAY.
PATH: Optional path to the element within the VARIANT data structure to be flattened.
OUTER: If set to TRUE, generates a row even if there are no entries to expand.
RECURSIVE: If TRUE, expands all sub-elements recursively.
MODE: Specifies whether to flatten objects, arrays, or both.

Examples

Flatten JSON data

Imagine a scenario from “The Hitchhiker’s Guide to the Galaxy” where you store information about interstellar journeys in a table, with details about each journey stored in a JSON column.

Our goal is to transform this semi-structured data into a tabular format for easy querying:

Input


_18CREATE
_18OR REPLACE TABLE interstellar_journeys AS
_18SELECT
_18    column1 AS journey_id,
_18    parse_json (column2) AS journey_data
_18FROM VALUES
_18    ( 1,
_18        '{
_18        "voyage": "Heart of Gold",
_18        "stops": [{"planet": "Earth"}, {"planet": "Vogon Constructor Fleet"}, {"planet": "Magrathea"}]
_18        }'
_18    ),
_18    ( 2,
_18        '{
_18        "voyage": "Starship Titanic",
_18        "stops": [{"planet": "Barnards Star"}, {"planet": "Betelgeuse"}]
_18        }'
_18    );

Table: interstellar_journeys

journey_id	journey_data
1	`{ "stops": [ { "planet": "Earth" }, { "planet": "Vogon Constructor Fleet" }, { "planet": "Magrathea" } ], "voyage": "Heart of Gold" }`
2	`{ "stops": [ { "planet": "Barnards Star" }, { "planet": "Betelgeuse" } ], "voyage": "Starship Titanic" }`

Snowflake SQL Query


_10SELECT 
_10    journey_id,
_10    f.value:planet::varchar AS planet_visited
_10FROM interstellar_journeys,
_10    LATERAL FLATTEN(input => journey_data:stops) f;

Output

journey_id	planet_visited
1	Earth
1	Vogon Constructor Fleet
1	Magrathea
2	Barnards Star
2	Betelgeuse

Explanation

This query retrieves the journey_id and the planets visited from the interstellar_journeys table.

The FLATTEN function is used to expand the stops array from the journey_data JSON column into individual rows, making each planet visit accessible.

For each row in interstellar_journeys, the FLATTEN function creates a new row for every planet in the stops array, extracting the planet’s name with f.value:planet as planet_visited. This process allows detailed analysis of each stop on the interstellar journeys.

Your AI copilot for Snowflake SQL

Write production-ready SQL with AI-based suggestions.

Get Started