[SQL] Utilizing CASE Expressions
In this article, I will explain the SQL CASE expression.
The CASE expression is a feature that realizes conditional branching like if-then-else in programming languages within SQL.
It is used in a very wide range of applications, such as data classification and conditional calculation during aggregation.
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
As a sample, create a Students table that stores student test scores.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Students (
id INTEGER PRIMARY KEY,
name TEXT,
score INTEGER
);
INSERT INTO Students (name, score) VALUES
('Aさん', 95),
('Bさん', 85),
('Cさん', 75),
('Dさん', 65),
('Eさん', 55),
('Fさん', 45);
* sqlite:///data.db
Done.
6 rows affected.
[]
%%sql
SELECT * FROM Students;
* sqlite:///data.db
Done.
| id | name | score |
|---|---|---|
| 1 | Aさん | 95 |
| 2 | Bさん | 85 |
| 3 | Cさん | 75 |
| 4 | Dさん | 65 |
| 5 | Eさん | 55 |
| 6 | Fさん | 45 |
Example 1: Ranking according to scores
Let’s assign ranks (grades) according to the scores as follows.
- 90 or more: S
- 80 or more: A
- 70 or more: B
- 60 or more: C
- Others: D
query = """
SELECT
name,
score,
CASE
WHEN score >= 90 THEN 'S'
WHEN score >= 80 THEN 'A'
WHEN score >= 70 THEN 'B'
WHEN score >= 60 THEN 'C'
ELSE 'D'
END as grade
FROM
Students;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (6, 3)
| name | score | grade |
|---|---|---|
| “Aさん” | 95 | “S” |
| “Bさん” | 85 | “A” |
| “Cさん” | 75 | “B” |
| “Dさん” | 65 | “C” |
| “Eさん” | 55 | “D” |
| “Fさん” | 45 | “D” |
In this way, by using the CASE expression in the SELECT clause, you can dynamically generate a new column based on the values of existing columns.
Example 2: Cross Tabulation (Conditional Aggregation)
Next, let’s aggregate the number of people for each rank.
Usually, GROUP BY is used, but by combining the CASE expression and aggregate functions (SUM or COUNT), you can create a horizontal summary table (like a pivot table).
query = """
SELECT
SUM(CASE WHEN score >= 80 THEN 1 ELSE 0 END) as Excellent_Count,
SUM(CASE WHEN score >= 60 AND score < 80 THEN 1 ELSE 0 END) as Good_Count,
SUM(CASE WHEN score < 60 THEN 1 ELSE 0 END) as Poor_Count
FROM
Students;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (1, 3)
| Excellent_Count | Good_Count | Poor_Count |
|---|---|---|
| 2 | 2 | 2 |
Explanation
CASE WHEN condition THEN 1 ELSE 0 ENDreturns1if the condition is met, and0otherwise.- By summing this up with
SUM, you can count the number of rows that meet the condition.
This technique is very useful when you want to calculate KPIs for specific conditions in a single query.
Summary
In this article, I introduced the basic usage and application examples of the CASE expression.
- Conditional Branching: You can convert data values according to conditions.
- Cross Tabulation: By combining with aggregate functions, flexible aggregation becomes possible.
By mastering the CASE expression, the range of data processing in SQL will expand significantly.