Snowflake GROUP BY: Aggregate Data into Groups

The GROUP BY clause in SQL is used to aggregate records into groups that share the same values in specified columns. It’s useful when you want to perform calculations on subsets of data and is often paired with aggregate functions like SUM(), AVG(), COUNT(), etc.

Syntax

The basic syntax for GROUP BY is:


_10SELECT column1, column2, aggregate_function(column3)
_10FROM table
_10GROUP BY column1, column2;

You can also use expressions, column positions, or more complex conditions for grouping.

Extensions

GROUP BY CUBE: Generates a result set that shows aggregates for all combinations of the specified columns. It’s useful for creating subtotals and grand totals.
- Use case: If you need to calculate the total, as well as subtotals for each dimension in a multidimensional dataset, GROUP BY CUBE is the right choice.
- Example: If you’re analyzing sales data grouped by both region and product, GROUP BY CUBE will provide aggregates for each region, each product, combinations of region and product, and a grand total across all regions and products.
GROUP BY ROLLUP: GROUP BY ROLLUP is similar to GROUP BY CUBE but less comprehensive. It creates a hierarchy of subtotals and rolls up data from the lowest level to the grand total.
- Use Case: Use GROUP BY ROLLUP when you need a hierarchical aggregation that moves from the most detailed up to a grand total. It’s more focused than GROUP BY CUBE as it only rolls up the totals in one direction.
- Example: For sales data grouped by year, month, and day, GROUP BY ROLLUP will give totals for each day, each month, each year, and the overall total.
GROUP BY GROUPING SETS: This extension provides the most flexibility by allowing explicit definition of multiple groups of columns. You specify exactly which combinations of columns to use for aggregation.
- Use Case: GROUP BY GROUPING SETS is ideal when you need specific groupings of data and are not interested in all possible combinations. It’s more efficient than using multiple GROUP BY clauses.
- Example: If you only want totals for region, product, and then region combined with product, you can specify this with GROUP BY GROUPING SETS.

Common Use Cases

Calculating Totals: Summing sales by region, counting orders by customer, etc.
Statistical Analysis: Average spending per user, maximum and minimum values.
Data Quality Checks: Counting nulls or the distribution of values.

Examples

GROUP BY vs. GROUP BY ALL

Let’s assume you have a sales table and you want to calculate the total sales by both region and product category.

Input

Table: sales

region	product_category	sales_amount
North	Electronics	1200
North	Electronics	1500
South	Electronics	900
North	Clothing	700
South	Clothing	600
South	Electronics	1200
North	Clothing	300
South	Clothing	800

Snowflake SQL Query: GROUP BY

Using GROUP BY to calculate total sales by both region and product_category to find the total sales in each category for each region:


_10SELECT region, product_category, SUM(sales_amount) AS total_sales
_10FROM sales
_10GROUP BY region, product_category;

region	product_category	sales_amount
North	Electronics	2700
South	Electronics	2100
North	Clothing	1000
South	Clothing	1400

Snowflake SQL Query: GROUP BY ALL

When the SELECT statement includes all non-aggregate fields, GROUP BY ALL automatically includes them in the grouping:


_10SELECT region, product_category, SUM(sales_amount) AS total_sales
_10FROM sales
_10GROUP BY ALL;

region	product_category	sales_amount
North	Electronics	2700
South	Electronics	2100
North	Clothing	1000
South	Clothing	1400

Your AI copilot for Snowflake SQL

Write production-ready SQL with AI-based suggestions.

Get Started