Snowflake
Data Aggregation with GROUP BY

Snowflake GROUP BY: Aggregate Data into Groups

The GROUP BY clause in SQL is used to aggregate records into groups that share the same values in specified columns. It’s useful when you want to perform calculations on subsets of data and is often paired with aggregate functions like SUM(), AVG(), COUNT(), etc.

Syntax

The basic syntax for GROUP BY is:


_10
SELECT column1, column2, aggregate_function(column3)
_10
FROM table
_10
GROUP BY column1, column2;

You can also use expressions, column positions, or more complex conditions for grouping.

Extensions

  • GROUP BY CUBE: Generates a result set that shows aggregates for all combinations of the specified columns. It’s useful for creating subtotals and grand totals.
    • Use case: If you need to calculate the total, as well as subtotals for each dimension in a multidimensional dataset, GROUP BY CUBE is the right choice.
    • Example: If you’re analyzing sales data grouped by both region and product, GROUP BY CUBE will provide aggregates for each region, each product, combinations of region and product, and a grand total across all regions and products.
  • GROUP BY ROLLUP: GROUP BY ROLLUP is similar to GROUP BY CUBE but less comprehensive. It creates a hierarchy of subtotals and rolls up data from the lowest level to the grand total.
    • Use Case: Use GROUP BY ROLLUP when you need a hierarchical aggregation that moves from the most detailed up to a grand total. It’s more focused than GROUP BY CUBE as it only rolls up the totals in one direction.
    • Example: For sales data grouped by year, month, and day, GROUP BY ROLLUP will give totals for each day, each month, each year, and the overall total.
  • GROUP BY GROUPING SETS: This extension provides the most flexibility by allowing explicit definition of multiple groups of columns. You specify exactly which combinations of columns to use for aggregation.
    • Use Case: GROUP BY GROUPING SETS is ideal when you need specific groupings of data and are not interested in all possible combinations. It’s more efficient than using multiple GROUP BY clauses.
    • Example: If you only want totals for region, product, and then region combined with product, you can specify this with GROUP BY GROUPING SETS.

Common Use Cases

  • Calculating Totals: Summing sales by region, counting orders by customer, etc.
  • Statistical Analysis: Average spending per user, maximum and minimum values.
  • Data Quality Checks: Counting nulls or the distribution of values.

Examples

GROUP BY vs. GROUP BY ALL

Let’s assume you have a sales table and you want to calculate the total sales by both region and product category.

Input

Table: sales

regionproduct_categorysales_amount
NorthElectronics1200
NorthElectronics1500
SouthElectronics900
NorthClothing700
SouthClothing600
SouthElectronics1200
NorthClothing300
SouthClothing800

Snowflake SQL Query: GROUP BY

Using GROUP BY to calculate total sales by both region and product_category to find the total sales in each category for each region:


_10
SELECT region, product_category, SUM(sales_amount) AS total_sales
_10
FROM sales
_10
GROUP BY region, product_category;

regionproduct_categorysales_amount
NorthElectronics2700
SouthElectronics2100
NorthClothing1000
SouthClothing1400

Snowflake SQL Query: GROUP BY ALL

When the SELECT statement includes all non-aggregate fields, GROUP BY ALL automatically includes them in the grouping:


_10
SELECT region, product_category, SUM(sales_amount) AS total_sales
_10
FROM sales
_10
GROUP BY ALL;

regionproduct_categorysales_amount
NorthElectronics2700
SouthElectronics2100
NorthClothing1000
SouthClothing1400
Y42 SQL completion user interface

Your AI copilot for Snowflake SQL

Write production-ready SQL with AI-based suggestions.

Get Started