Snowflake ROW_NUMBER: Assign Sequential Numbers to Rows
The ROW_NUMBER()
is a SQL function that assigns a sequential integer to each row within a partition of a result set. It starts numbering from 1 for each partition, making it ideal for tasks like ranking items within categories.
What it does?
ROW_NUMBER()
gives each row in a result set a unique number based on the order specified. This is particularly useful when you need to sort or rank data within categories (partitions).
Syntax
Here’s how you use the ROW_NUMBER() function in Snowflake:
_10ROW_NUMBER() OVER (_10 [ PARTITION BY <expr1> [, <expr2> ...] ]_10 ORDER BY <expr3> [, <expr4> ...] [ASC | DESC]_10)
where:
PARTITION BY
: (Optional) Divides the result set into partitions where the ROW_NUMBER() resets to 1 for each partition.ORDER BY
: Determines the order of rows in each partition; this is where the numbers are assigned.
Important notes
- No Arguments: The ROW_NUMBER() function itself does not take any parameters inside the parentheses.
- Partitioning: Partitioning is not mandatory. If omitted, the entire result set is treated as a single group.
- Ordering: You must specify an order. If you are partitioning, the numbering within each partition is based on this order.
Common Uses of ROW_NUMBER()
- Ranking Items within Categories: In e-commerce platforms,
ROW_NUMBER()
can be used to rank products within each category based on criteria like sales volume, customer reviews, or recent sales trends. - Top-N Queries: When you need to retrieve the top or bottom
N
records within each group, such as the top 3 salespersons in each region or the worst performing students in each class,ROW_NUMBER()
can be used to easily filter these records. - Removing Duplicates: If your data contains duplicates and you want to delete them while keeping one record for each set of duplicates,
ROW_NUMBER()
can help identify and retain the first occurrence of each duplicated entry based on a specific order. - Pagination: In applications where data needs to be displayed in a paginated format,
ROW_NUMBER()
can assign row numbers and then select only a specific range of those numbers to show a subset of results on each page. - Sequential Numbering of Events or Transactions: In logs or transaction records,
ROW_NUMBER()
can be used to sequentially number events as they occur over time, regardless of their grouping or categorization.
Examples
Ranking Data Within Partition
Imagine you have a table that tracks how many times each character from “The Hitchhiker’s Guide to the Galaxy” has visited various planets. Let’s write a SQL query to rank these characters based on the number of visits to each planet.
Input
Table: character_visits
character_id | character_name | planet | visits |
---|---|---|---|
1 | Arthur Dent | Earth | 100 |
1 | Arthur Dent | Magrathea | 5 |
2 | Ford Prefect | Betelgeuse Five | 50 |
2 | Ford Prefect | Earth | 75 |
3 | Zaphod Beeblebrox | Betelgeuse Five | 85 |
3 | Zaphod Beeblebrox | Magrathea | 10 |
Snowflake SQL Query
_10SELECT character_name, planet, visits,_10 ROW_NUMBER() OVER (_10 PARTITION BY planet_10 ORDER BY visits DESC_10 ) AS travel_rank_10FROM character_visits;
Output
character_name | planet | visits | travel_rank |
---|---|---|---|
Zaphod Beeblebrox | Betelgeuse Five | 85 | 1 |
Ford Prefect | Betelgeuse Five | 50 | 2 |
Arthur Dent | Earth | 100 | 1 |
Ford Prefect | Earth | 75 | 2 |
Zaphod Beeblebrox | Magrathea | 10 | 1 |
Arthur Dent | Magrathea | 5 | 2 |
Explanation
- Partition by planet: The data is partitioned by the planet, meaning the ROW_NUMBER() function resets for each planet.
- Order by visits DESC: Within each partition, the data is ordered by the number of visits in descending order. The character with the most visits to a planet will have a travel rank of 1.