Snowflake SUBSTRING: How to Extract Parts of a String
What it does
SUBSTRING
helps you extract a specific piece of a string based on the starting position and length you define. It's perfect for pulling out relevant information from a larger text block, such as a name from a full address or a keyword from a sentence.
Syntax
You can use SUBSTR
or SUBSTRING
interchangeably to perform this action:
_10SUBSTR(`<base_expr>`, `<start_expr>` [ , `<length_expr>` ])_10_10SUBSTRING(`<base_expr>`, `<start_expr>` [ , `<length_expr>` ])
where
<base_expr>
: The string you want to extract from.<start_expr>
: The starting position for the extraction (position counts start at 1).<length_expr>
(optional): How many characters to include in the extracted string. If omitted, the substring will extend to the end of the base string.
Common use cases
- Extracting first names: If you have a database column with full names and you need to get just the first name
- Pulling out area codes from phone numbers: If you're dealing with North American phone numbers and need to extract the area code
- Analyzing product codes: To analyze a specific segment of a product code where, for instance, the first two characters represent the product category
Examples
Basic data extraction with SUBSTRING
In this example from "The Hitchhiker's Guide to the Galaxy," we extract fixed-length temperature data directly from structured text entries about different planets.
Input
Table: galactic_observations
observation_id | planet | data |
---|---|---|
1 | Betelgeuse | Temperature: 450K, Atmosphere: 72% nitrogen, 21% oxygen, Gravity: 1.1G |
2 | Frogstar | Temperature: 290K, Atmosphere: 95% carbon dioxide, Gravity: 3.7G |
3 | Magrathea | Temperature: 235K, Atmosphere: 85% argon, Gravity: 0.9G |
Snowflake SQL Query
_10SELECT_10 observation_id,_10 planet,_10 data,_10 SUBSTRING(data, 14, 4) as temperature_10FROM_10 galactic_observations;
Output
observation_id | planet | data | temperature |
---|---|---|---|
1 | Betelgeuse | Temperature: 450K, Atmosphere: 72% nitrogen, 21% oxygen, Gravity: 1.1G | 450K |
2 | Frogstar | Temperature: 290K, Atmosphere: 95% carbon dioxide, Gravity: 3.7G | 290K |
3 | Magrathea | Temperature: 235K, Atmosphere: 85% argon, Gravity: 0.9G | 235K |
Using SUBSTRING and POSITION functions for data extraction
This approach goes a step further by dynamically identifying where the gravity data starts in the text. It adapts to different lengths of preceding text, which is useful when dealing with varying data entries.
Input
Table: galactic_observations
observation_id | planet | data |
---|---|---|
1 | Betelgeuse | Temperature: 450K, Atmosphere: 72% nitrogen, 21% oxygen, Gravity: 1.1G |
2 | Frogstar | Temperature: 290K, Atmosphere: 95% carbon dioxide, Gravity: 3.7G |
3 | Magrathea | Temperature: 235K, Atmosphere: 85% argon, Gravity: 0.9G |
Snowflake SQL Query
_10SELECT_10 observation_id,_10 planet,_10 data,_10 SUBSTRING(data, POSITION('Gravity: ' IN data) + 9, 4) AS gravity_10FROM_10 galactic_observations;
Output
observation_id | planet | data | gravity |
---|---|---|---|
1 | Betelgeuse | Temperature: 450K, Atmosphere: 72% nitrogen, 21% oxygen, Gravity: 1.1G | 1.1G |
2 | Frogstar | Temperature: 290K, Atmosphere: 95% carbon dioxide, Gravity: 3.7G | 3.7G |
3 | Magrathea | Temperature: 235K, Atmosphere: 85% argon, Gravity: 0.9G | 0.9G |
Explanation
- Locate keyword: Uses
POSITION('Gravity: ' IN data)
to find where the gravity information starts within the string. - Adjust start: Adds 9 to the position to skip past 'Gravity: ' and target the start of the numerical gravity data.
- Extract data: Applies
SUBSTRING
from the adjusted position to pull out the next 4 characters, capturing the gravity value like '1.1G'.