Snowflake EXCEPT: Find Differences Between Data Sets
The EXCEPT operator in Snowflake is a set operator used to combine results from two queries by removing rows in the first query that also appear in the second query. This operator is useful when you need to find differences between two sets of data.
Syntax
_10SELECT column1, column2 FROM table1 _10EXCEPT _10SELECT column3, column4 FROM table2;
Usage Notes
- Column Match: Both queries must select the same number of columns, and the data types of these columns should be compatible.
- Column Order Matters: The order and meaning of the columns should match between the two queries. Misalignment in column order or purpose can lead to incorrect results.
- Duplication Removal:
EXCEPT
inherently removes duplicates from the resulting set, similar toSELECT DISTINCT
. - Precedence:
EXCEPT
has the same precedence level asUNION [ALL]
andMINUS
, but a lower precedence thanINTERSECT
. Use parentheses to control the order of evaluation if your query combines multiple types of set operations. - Output Column Names: The names of the output columns are determined by the first query in the EXCEPT operation.
Common Pitfalls
- Avoid using
SELECT *
with set operations unless you are certain that the tables have exactly the same column structure and order. - Ensure that the data types and the column order in both queries match exactly to avoid unexpected results.
Examples
Use EXCEPT to filter rows
Suppose you have two datasets: one listing all employees and another listing employees who have completed a specific training. To find out which employees have not completed the training, you could use the EXCEPT
operator:
Input
Table: employees
employee_id | first_name | last_name |
---|---|---|
1 | Arthur | Dent |
2 | Ford | Prefect |
3 | Zaphod | Beeblebrox |
4 | Trillian | Astra |
5 | Marvin | Android |
Table: trained_employees
employee_id | first_name | last_name |
---|---|---|
1 | Arthur | Dent |
2 | Ford | Prefect |
4 | Trillian | Astra |
Snowflake SQL Query
_10SELECT employee_id, first_name, last_name FROM employees_10EXCEPT_10SELECT employee_id, first_name, last_name FROM trained_employees;
Output
employee_id | first_name | last_name |
---|---|---|
3 | Zaphod | Beeblebrox |
5 | Marvin | Android |
This query will correctly identify Zaphod Beeblebrox and Marvin Android as the employees without completed training records by subtracting the set of trained employee IDs from the complete list of employee IDs.