[SQL] Difference between HAVING and WHERE Clauses
In this article, I will explain the SQL HAVING clause.
The HAVING clause filters data like the WHERE clause, but the timing and target of application are different.
Let’s look at “how to use WHERE and HAVING”, one of the points where SQL beginners stumble the most, with practical examples.
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
As a sample, create an Orders table that stores order data for each customer.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER,
amount INTEGER
);
INSERT INTO Orders (customer_id, amount) VALUES
(1, 100),
(1, 200),
(2, 500),
(3, 100),
(3, 100),
(3, 100),
(4, 1000);
* sqlite:///data.db
Done.
7 rows affected.
[]
%%sql
SELECT * FROM Orders;
* sqlite:///data.db
Done.
| id | customer_id | amount |
|---|---|---|
| 1 | 1 | 100 |
| 2 | 1 | 200 |
| 3 | 2 | 500 |
| 4 | 3 | 100 |
| 5 | 3 | 100 |
| 6 | 3 | 100 |
| 7 | 4 | 1000 |
Task: Extract customers whose total order amount is 400 or more
For each customer (customer_id), sum the amount and display only customers whose result is 400 or more.
Mistake: Trying to filter aggregation results with WHERE clause
The following query will result in an error.
SELECT
customer_id,
SUM(amount) as total_amount
FROM
Orders
WHERE
SUM(amount) >= 400
GROUP BY
customer_id;
Reason: The WHERE clause filters rows before aggregation, so aggregate functions (such as SUM) cannot be specified as conditions.
Correct: Use HAVING clause
When filtering results after aggregation, use the HAVING clause.
query = """
SELECT
customer_id,
SUM(amount) as total_amount
FROM
Orders
GROUP BY
customer_id
HAVING
SUM(amount) >= 400;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (2, 2)
| customer_id | total_amount |
|---|---|
| 2 | 500 |
| 4 | 1000 |
Explanation: Execution Order
Understanding the execution order of SQL makes the difference clear.
- FROM: Select target table
- WHERE: Filter rows (before aggregation)
- GROUP BY: Group
- HAVING: Filter groups (after aggregation)
- SELECT: Select columns to display
WHERE is executed before grouping, so it cannot know the aggregation results.
On the other hand, HAVING is executed after grouping, so filtering based on aggregation results is possible.
Summary
- WHERE: Filters rows before aggregation.
- HAVING: Filters groups after aggregation.
If you remember “use HAVING when you want to specify conditions on aggregation results”, you won’t make mistakes.