[SQL] Common Table Expressions (CTE)
In this article, I will explain SQL Common Table Expressions (CTE).
CTE is a feature that allows you to name a temporary result set using the WITH clause and refer to it in subsequent queries.
By rewriting complex subqueries using CTEs, the readability of the query is greatly improved, and logic reuse becomes easier.
In this article, I will compare the case of using subqueries and the case of using CTEs, and experience the benefits.
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
As a sample, create a Sales table that stores sales data.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Sales (
id INTEGER PRIMARY KEY,
sale_date TEXT,
category TEXT,
amount INTEGER
);
INSERT INTO Sales (sale_date, category, amount) VALUES
('2023-01-01', 'Electronics', 1000),
('2023-01-02', 'Clothing', 500),
('2023-01-03', 'Electronics', 1200),
('2023-01-04', 'Food', 300),
('2023-01-05', 'Clothing', 600),
('2023-01-06', 'Electronics', 1100),
('2023-01-07', 'Food', 400);
* sqlite:///data.db
Done.
7 rows affected.
[]
%%sql
SELECT * FROM Sales;
* sqlite:///data.db
Done.
| id | sale_date | category | amount |
|---|---|---|---|
| 1 | 2023-01-01 | Electronics | 1000 |
| 2 | 2023-01-02 | Clothing | 500 |
| 3 | 2023-01-03 | Electronics | 1200 |
| 4 | 2023-01-04 | Food | 300 |
| 5 | 2023-01-05 | Clothing | 600 |
| 6 | 2023-01-06 | Electronics | 1100 |
| 7 | 2023-01-07 | Food | 400 |
Task: Extract categories where the total sales per category is higher than the average total sales of all categories
Although the condition is a bit complex, let’s try to express this in SQL.
- Calculate the total sales for each category.
- Filter those that are higher than the average total sales of all categories.
Method 1: Using Subqueries
First, let’s try writing it using subqueries (inline views) without using CTEs.
query = """
SELECT
category,
total_sales
FROM (
-- Calculate total sales per category
SELECT
category,
SUM(amount) as total_sales
FROM
Sales
GROUP BY
category
) as CategorySales
WHERE
total_sales > (
-- Calculate average total sales of all categories
SELECT
AVG(total_sales)
FROM (
SELECT
category,
SUM(amount) as total_sales
FROM
Sales
GROUP BY
category
)
);
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (1, 2)
| category | total_sales |
|---|---|
| “Electronics” | 3300 |
The same aggregation (GROUP BY category) appears twice, and the nesting is deep and hard to read.
Method 2: Using CTE
Next, let’s rewrite it using CTE (WITH clause).
query = """
WITH CategorySales AS (
-- Calculate total sales per category (defined here and reused)
SELECT
category,
SUM(amount) as total_sales
FROM
Sales
GROUP BY
category
)
SELECT
category,
total_sales
FROM
CategorySales
WHERE
total_sales > (SELECT AVG(total_sales) FROM CategorySales);
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (1, 2)
| category | total_sales |
|---|---|
| “Electronics” | 3300 |
Explanation
- Readability: The logic of “sales per category” is defined first with the name
CategorySales, so the main query is clean. - Reusability:
CategorySalesis referenced in both theFROMclause and the subquery within theWHEREclause. Code duplication is eliminated, and if modification is needed, it only needs to be done in one place.
The result is the same, but using CTE is overwhelmingly easier to maintain.
Summary
In this article, I introduced Common Table Expressions (CTE).
- You can define a temporary result set using the WITH clause.
- It modularizes complex queries and improves readability and maintainability.
- Especially effective when using the same logic multiple times.
When writing SQL, I want to actively use CTEs and strive for “readable queries”.