[SQL] Window Functions
In this article, I will explain SQL Window functions. Unlike aggregate functions (SUM, AVG, etc.), Window functions are a powerful feature that allows you to calculate values for each row without aggregating rows, while referring to values of other rows (such as rank or cumulative sum).
I will look at the differences and usage of the following three functions, which are often used especially when creating rankings.
- RANK(): If there is a tie, the next rank is skipped (e.g., 1, 2, 2, 4…).
- DENSE_RANK(): If there is a tie, the next rank is not skipped (e.g., 1, 2, 2, 3…).
- ROW_NUMBER(): Assigns a sequential number to each row even if there is a tie (e.g., 1, 2, 3, 4…).
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
As a sample, create a Scores table that stores the test score data of a certain class.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Scores (
id INTEGER PRIMARY KEY,
student_name TEXT,
score INTEGER
);
INSERT INTO Scores (student_name, score) VALUES
('Aさん', 100),
('Bさん', 90),
('Cさん', 90),
('Dさん', 80),
('Eさん', 70),
('Fさん', 90);
* sqlite:///data.db
Done.
6 rows affected.
[]
%%sql
SELECT * FROM Scores;
* sqlite:///data.db
Done.
| id | student_name | score |
|---|---|---|
| 1 | Aさん | 100 |
| 2 | Bさん | 90 |
| 3 | Cさん | 90 |
| 4 | Dさん | 80 |
| 5 | Eさん | 70 |
| 6 | Fさん | 90 |
Execution of Window Functions
Now, let’s use RANK(), DENSE_RANK(), and ROW_NUMBER() to rank the scores.
Use ORDER BY score DESC to sort by score in descending order and assign ranks.
The results are retrieved and displayed as a Polars DataFrame.
query = """
SELECT
student_name,
score,
RANK() OVER (ORDER BY score DESC) as rank_val,
DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank_val,
ROW_NUMBER() OVER (ORDER BY score DESC) as row_num_val
FROM
Scores;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (6, 5)
| student_name | score | rank_val | dense_rank_val | row_num_val |
|---|---|---|---|---|
| “Aさん” | 100 | 1 | 1 | 1 |
| “Bさん” | 90 | 2 | 2 | 2 |
| “Cさん” | 90 | 2 | 2 | 3 |
| “Fさん” | 90 | 2 | 2 | 4 |
| “Dさん” | 80 | 5 | 3 | 5 |
| “Eさん” | 70 | 6 | 4 | 6 |
Explanation of Results
- rank_val (RANK): There are 3 people with 90 points, so they are all 2nd place, and the next person with 80 points is 5th place (3rd and 4th places are skipped).
- dense_rank_val (DENSE_RANK): All people with 90 points are 2nd place, but the next person with 80 points is 3rd place (ranks are not skipped).
- row_num_val (ROW_NUMBER): Even if the scores are the same, a unique sequential number (2, 3, 4) is assigned (the order may depend on the DB implementation or reading order).
In this way, it is important to use the functions properly depending on the purpose (how to handle ties).
Summary
In this article, I introduced the differences in ranking using SQL Window functions.
- RANK: Skips ranks (Olympic style)
- DENSE_RANK: Does not skip ranks
- ROW_NUMBER: Simple sequential number
By using these properly, the range of analysis will expand.