[SQL] Difference between HAVING and WHERE Clauses

In this article, I will explain the SQL HAVING clause. The HAVING clause filters data like the WHERE clause, but the timing and target of application are different.

Let’s look at “how to use WHERE and HAVING”, one of the points where SQL beginners stumble the most, with practical examples.

Preparation

In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook. Also, I will use Polars to display dataframes.

github

  • The jupyter notebook file on github is here

google colaboratory

  • If you want to run it on google colaboratory, click here

Author’s Environment

Here is the author’s environment.

!sw_vers
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74
!python -V
Python 3.12.12

Import necessary libraries.

import sqlite3
import polars as pl

%load_ext sql
%config SqlMagic.feedback = True

%config SqlMagic.style = '_DEPRECATED_DEFAULT'

Data Preparation

As a sample, create an Orders table that stores order data for each customer.

import os

if os.path.exists("data.db"):
    os.remove("data.db")

os.system("touch data.db")
os.system("chmod 664 data.db")

# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    amount INTEGER
);

INSERT INTO Orders (customer_id, amount) VALUES
    (1, 100),
    (1, 200),
    (2, 500),
    (3, 100),
    (3, 100),
    (3, 100),
    (4, 1000);
 * sqlite:///data.db
Done.
7 rows affected.





[]
%%sql
SELECT * FROM Orders;
 * sqlite:///data.db
Done.
idcustomer_idamount
11100
21200
32500
43100
53100
63100
741000

Task: Extract customers whose total order amount is 400 or more

For each customer (customer_id), sum the amount and display only customers whose result is 400 or more.

Mistake: Trying to filter aggregation results with WHERE clause

The following query will result in an error.

SELECT
    customer_id,
    SUM(amount) as total_amount
FROM
    Orders
WHERE
    SUM(amount) >= 400
GROUP BY
    customer_id;

Reason: The WHERE clause filters rows before aggregation, so aggregate functions (such as SUM) cannot be specified as conditions.

Correct: Use HAVING clause

When filtering results after aggregation, use the HAVING clause.

query = """
SELECT
    customer_id,
    SUM(amount) as total_amount
FROM
    Orders
GROUP BY
    customer_id
HAVING
    SUM(amount) >= 400;
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (2, 2)

customer_idtotal_amount
2500
41000

Explanation: Execution Order

Understanding the execution order of SQL makes the difference clear.

  1. FROM: Select target table
  2. WHERE: Filter rows (before aggregation)
  3. GROUP BY: Group
  4. HAVING: Filter groups (after aggregation)
  5. SELECT: Select columns to display

WHERE is executed before grouping, so it cannot know the aggregation results. On the other hand, HAVING is executed after grouping, so filtering based on aggregation results is possible.

Summary

  • WHERE: Filters rows before aggregation.
  • HAVING: Filters groups after aggregation.

If you remember “use HAVING when you want to specify conditions on aggregation results”, you won’t make mistakes.