[SQL] Common Table Expressions (CTE)

In this article, I will explain SQL Common Table Expressions (CTE). CTE is a feature that allows you to name a temporary result set using the WITH clause and refer to it in subsequent queries.

By rewriting complex subqueries using CTEs, the readability of the query is greatly improved, and logic reuse becomes easier.

In this article, I will compare the case of using subqueries and the case of using CTEs, and experience the benefits.

Preparation

In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook. Also, I will use Polars to display dataframes.

github

  • The jupyter notebook file on github is here

google colaboratory

  • If you want to run it on google colaboratory, click here

Author’s Environment

Here is the author’s environment.

!sw_vers
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74
!python -V
Python 3.12.12

Import necessary libraries.

import sqlite3
import polars as pl

%load_ext sql
%config SqlMagic.feedback = True

%config SqlMagic.style = '_DEPRECATED_DEFAULT'

Data Preparation

As a sample, create a Sales table that stores sales data.

import os

if os.path.exists("data.db"):
    os.remove("data.db")

os.system("touch data.db")
os.system("chmod 664 data.db")

# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Sales (
    id INTEGER PRIMARY KEY,
    sale_date TEXT,
    category TEXT,
    amount INTEGER
);

INSERT INTO Sales (sale_date, category, amount) VALUES
    ('2023-01-01', 'Electronics', 1000),
    ('2023-01-02', 'Clothing', 500),
    ('2023-01-03', 'Electronics', 1200),
    ('2023-01-04', 'Food', 300),
    ('2023-01-05', 'Clothing', 600),
    ('2023-01-06', 'Electronics', 1100),
    ('2023-01-07', 'Food', 400);
 * sqlite:///data.db
Done.
7 rows affected.





[]
%%sql
SELECT * FROM Sales;
 * sqlite:///data.db
Done.
idsale_datecategoryamount
12023-01-01Electronics1000
22023-01-02Clothing500
32023-01-03Electronics1200
42023-01-04Food300
52023-01-05Clothing600
62023-01-06Electronics1100
72023-01-07Food400

Task: Extract categories where the total sales per category is higher than the average total sales of all categories

Although the condition is a bit complex, let’s try to express this in SQL.

  1. Calculate the total sales for each category.
  2. Filter those that are higher than the average total sales of all categories.

Method 1: Using Subqueries

First, let’s try writing it using subqueries (inline views) without using CTEs.

query = """
SELECT
    category,
    total_sales
FROM (
    -- Calculate total sales per category
    SELECT
        category,
        SUM(amount) as total_sales
    FROM
        Sales
    GROUP BY
        category
) as CategorySales
WHERE
    total_sales > (
        -- Calculate average total sales of all categories
        SELECT
            AVG(total_sales)
        FROM (
            SELECT
                category,
                SUM(amount) as total_sales
            FROM
                Sales
            GROUP BY
                category
        )
    );
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (1, 2)

categorytotal_sales
“Electronics”3300

The same aggregation (GROUP BY category) appears twice, and the nesting is deep and hard to read.

Method 2: Using CTE

Next, let’s rewrite it using CTE (WITH clause).

query = """
WITH CategorySales AS (
    -- Calculate total sales per category (defined here and reused)
    SELECT
        category,
        SUM(amount) as total_sales
    FROM
        Sales
    GROUP BY
        category
)
SELECT
    category,
    total_sales
FROM
    CategorySales
WHERE
    total_sales > (SELECT AVG(total_sales) FROM CategorySales);
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (1, 2)

categorytotal_sales
“Electronics”3300

Explanation

  • Readability: The logic of “sales per category” is defined first with the name CategorySales, so the main query is clean.
  • Reusability: CategorySales is referenced in both the FROM clause and the subquery within the WHERE clause. Code duplication is eliminated, and if modification is needed, it only needs to be done in one place.

The result is the same, but using CTE is overwhelmingly easier to maintain.

Summary

In this article, I introduced Common Table Expressions (CTE).

  • You can define a temporary result set using the WITH clause.
  • It modularizes complex queries and improves readability and maintainability.
  • Especially effective when using the same logic multiple times.

When writing SQL, I want to actively use CTEs and strive for “readable queries”.