Snowflake QUALIFY: How to Filter Window Function Results
The QUALIFY
clause in Snowflake is used to filter the results of window functions, which are functions that perform calculations across a set of table rows that are related to the current row. This is similar to the HAVING
clause that filters aggregate functions (GROUP BY
), but QUALIFY
does this specifically for window functions without needing nested queries.
Syntax
_10SELECT <column_list>_10FROM <data_source>_10[GROUP BY ...]_10[HAVING ...]_10QUALIFY <predicate>_10[ ... ]
Why use the QUALIFY clause?
The QUALIFY
clause simplifies SQL queries by eliminating the need for nested subqueries when filtering results based on window functions. Below are two examples that demonstrate how QUALIFY can simplify the same query that traditionally requires a nested subquery.
Scenario: Identifying top sales days
Suppose you want to identify the top three sales days for each store based on total sales.
Traditional nested query approach
Traditionally, you might use a nested query with a window function and an outer query that filters these results:
_10SELECT *_10FROM (_10 SELECT store_id, sales_date, total_sales,_10 ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_sales DESC) AS rank_10 FROM store_sales_10) AS subquery_10WHERE rank <= 3;
Simplified query using QUALIFY
With QUALIFY
, you can incorporate this filtering directly in the main query, which simplifies the overall approach:
_10SELECT store_id, sales_date, total_sales_10FROM store_sales_10QUALIFY ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_sales DESC) <= 3;
❄️ In Snowflake, expressions in the SELECT clause, such as window functions, can be referenced by their column aliases in other clauses.
_10SELECT store_id, sales_date, total_sale_10 ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_sales DESC) as rank_10FROM store_sales_10QUALIFY rank <= 3;
Examples
Dynamic filtering with the QUALIFY Clause
Let's find the last known location of each character from the "The Hitchhiker's Guide to the Galaxy" before the destruction of Earth, assuming we have a timeline of events for each character leading up to that moment.
Input
Table: galactic_travel_log
id | character | planet | visit_order |
---|---|---|---|
1 | Zaphod Beeblebrox | Betelgeuse Five | 1 |
2 | Zaphod Beeblebrox | Earth | 2 |
3 | Trillian | Earth | 1 |
4 | Arthur Dent | Earth | 1 |
5 | Ford Prefect | Betelgeuse Seven | 1 |
6 | Ford Prefect | Earth | 2 |
Snowflake SQL Query
_12SELECT_12 id,_12 character,_12 planet,_12 visit_order_12FROM_12 galactic_travel_log QUALIFY ROW_NUMBER() OVER (_12 PARTITION BY_12 character_12 ORDER BY_12 visit_order DESC_12 ) = 1;
Output
id | character | planet | visit_order |
---|---|---|---|
2 | Zaphod Beeblebrox | Earth | 2 |
3 | Trillian | Earth | 1 |
4 | Arthur Dent | Earth | 1 |
6 | Ford Prefect | Earth | 2 |