[SQL] Utilizing CASE Expressions

In this article, I will explain the SQL CASE expression. The CASE expression is a feature that realizes conditional branching like if-then-else in programming languages within SQL.

It is used in a very wide range of applications, such as data classification and conditional calculation during aggregation.

Preparation

In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook. Also, I will use Polars to display dataframes.

github

  • The jupyter notebook file on github is here

google colaboratory

  • If you want to run it on google colaboratory, click here

Author’s Environment

Here is the author’s environment.

!sw_vers
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74
!python -V
Python 3.12.12

Import necessary libraries.

import sqlite3
import polars as pl

%load_ext sql
%config SqlMagic.feedback = True

%config SqlMagic.style = '_DEPRECATED_DEFAULT'

Data Preparation

As a sample, create a Students table that stores student test scores.

import os

if os.path.exists("data.db"):
    os.remove("data.db")

os.system("touch data.db")
os.system("chmod 664 data.db")

# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Students (
    id INTEGER PRIMARY KEY,
    name TEXT,
    score INTEGER
);

INSERT INTO Students (name, score) VALUES
    ('Aさん', 95),
    ('Bさん', 85),
    ('Cさん', 75),
    ('Dさん', 65),
    ('Eさん', 55),
    ('Fさん', 45);
 * sqlite:///data.db
Done.
6 rows affected.





[]
%%sql
SELECT * FROM Students;
 * sqlite:///data.db
Done.
idnamescore
1Aさん95
2Bさん85
3Cさん75
4Dさん65
5Eさん55
6Fさん45

Example 1: Ranking according to scores

Let’s assign ranks (grades) according to the scores as follows.

  • 90 or more: S
  • 80 or more: A
  • 70 or more: B
  • 60 or more: C
  • Others: D
query = """
SELECT
    name,
    score,
    CASE
        WHEN score >= 90 THEN 'S'
        WHEN score >= 80 THEN 'A'
        WHEN score >= 70 THEN 'B'
        WHEN score >= 60 THEN 'C'
        ELSE 'D'
    END as grade
FROM
    Students;
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (6, 3)

namescoregrade
“Aさん”95“S”
“Bさん”85“A”
“Cさん”75“B”
“Dさん”65“C”
“Eさん”55“D”
“Fさん”45“D”

In this way, by using the CASE expression in the SELECT clause, you can dynamically generate a new column based on the values of existing columns.

Example 2: Cross Tabulation (Conditional Aggregation)

Next, let’s aggregate the number of people for each rank. Usually, GROUP BY is used, but by combining the CASE expression and aggregate functions (SUM or COUNT), you can create a horizontal summary table (like a pivot table).

query = """
SELECT
    SUM(CASE WHEN score >= 80 THEN 1 ELSE 0 END) as Excellent_Count,
    SUM(CASE WHEN score >= 60 AND score < 80 THEN 1 ELSE 0 END) as Good_Count,
    SUM(CASE WHEN score < 60 THEN 1 ELSE 0 END) as Poor_Count
FROM
    Students;
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (1, 3)

Excellent_CountGood_CountPoor_Count
222

Explanation

  • CASE WHEN condition THEN 1 ELSE 0 END returns 1 if the condition is met, and 0 otherwise.
  • By summing this up with SUM, you can count the number of rows that meet the condition.

This technique is very useful when you want to calculate KPIs for specific conditions in a single query.

Summary

In this article, I introduced the basic usage and application examples of the CASE expression.

  • Conditional Branching: You can convert data values according to conditions.
  • Cross Tabulation: By combining with aggregate functions, flexible aggregation becomes possible.

By mastering the CASE expression, the range of data processing in SQL will expand significantly.