Snowflake SUBSTRING: Divide Strings into Arrays
The SPLIT
function in Snowflake is used to divide a string into an array of substrings based on a specified separator. This can be useful for parsing structured string data or preparing data for further processing steps.
Syntax
The syntax for the SPLIT
function is:
_10SPLIT(<string>, <separator>)
where:
string
: The text to be split into parts.separator
: The delimiter used to split the string. If the separator is an empty string, the entire string is returned as a single element in the array.
The function returns an array of strings split according to the separator.
Usage in Data Processing
Using SPLIT
, you can transform a single string into multiple records or use it in conjunction with other functions like FLATTEN
to process each element individually. This is useful in scenarios where data comes in a single string but needs to be treated as separate values for analysis or further processing.
Common Use Cases
- Data Normalization: Preparing data by splitting and transforming into a more usable format.
- Log Parsing: Breaking down log entries into discrete fields.
- CSV Processing: Turning CSV formatted strings into arrays for easier handling.
SPLIT vs. SPLIT_PART
SPLIT
provides an array of all parts, allowing for multiple uses of the fragments, while SPLIT_PART
focuses on extracting a single, specific part.
SPLIT_PART syntax:
_10SPLIT_PART(<string>, <delimiter>, <partNumber>)
Examples
Split String Into an Array of Strings
Let’s use an example from “The Hitchhiker’s Guide to the Galaxy” where we might have a string of items that Arthur Dent might carry in his satchel, separated by commas.
Input
String: Towel, Guide, Sub-Etha Sens-O-Matic, Babelfish
Snowflake SQL Query
_10SELECT SPLIT('Towel, Guide, Sub-Etha Sens-O-Matic, Babelfish', ', ');
Output
This will produce an array of strings:
_10[_10"Towel",_10"Guide",_10"Sub-Etha Sens-O-Matic",_10"Babelfish"_10]