[sqlite3] Window Functions

[SQL] Window Functions

In this article, I will explain SQL Window functions. Unlike aggregate functions (SUM, AVG, etc.), Window functions are a powerful feature that allows you to calculate values for each row without aggregating rows, while referring to values of other rows (such as rank or cumulative sum).

I will look at the differences and usage of the following three functions, which are often used especially when creating rankings.

RANK(): If there is a tie, the next rank is skipped (e.g., 1, 2, 2, 4…).
DENSE_RANK(): If there is a tie, the next rank is not skipped (e.g., 1, 2, 2, 3…).
ROW_NUMBER(): Assigns a sequential number to each row even if there is a tie (e.g., 1, 2, 3, 4…).

Preparation

In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook. Also, I will use Polars to display dataframes.

github

The jupyter notebook file on github is here

google colaboratory

If you want to run it on google colaboratory, click here

Author’s Environment

Here is the author’s environment.

!sw_vers

ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74

!python -V

Python 3.12.12

Import necessary libraries.

import sqlite3
import polars as pl

%load_ext sql
%config SqlMagic.feedback = True

%config SqlMagic.style = '_DEPRECATED_DEFAULT'

Data Preparation

As a sample, create a Scores table that stores the test score data of a certain class.

import os

if os.path.exists("data.db"):
    os.remove("data.db")

os.system("touch data.db")
os.system("chmod 664 data.db")

# Database connection
%sql sqlite:///data.db

'Connected: @data.db'

%%sql
CREATE TABLE Scores (
    id INTEGER PRIMARY KEY,
    student_name TEXT,
    score INTEGER
);

INSERT INTO Scores (student_name, score) VALUES
    ('Aさん', 100),
    ('Bさん', 90),
    ('Cさん', 90),
    ('Dさん', 80),
    ('Eさん', 70),
    ('Fさん', 90);

 * sqlite:///data.db
Done.
6 rows affected.





[]

%%sql
SELECT * FROM Scores;

 * sqlite:///data.db
Done.

id	student_name	score
1	Aさん	100
2	Bさん	90
3	Cさん	90
4	Dさん	80
5	Eさん	70
6	Fさん	90

Execution of Window Functions

Now, let’s use RANK(), DENSE_RANK(), and ROW_NUMBER() to rank the scores. Use ORDER BY score DESC to sort by score in descending order and assign ranks.

The results are retrieved and displayed as a Polars DataFrame.

query = """
SELECT
    student_name,
    score,
    RANK() OVER (ORDER BY score DESC) as rank_val,
    DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank_val,
    ROW_NUMBER() OVER (ORDER BY score DESC) as row_num_val
FROM
    Scores;
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (6, 5)

student_name	score	rank_val	dense_rank_val	row_num_val
“Aさん”	100	1	1	1
“Bさん”	90	2	2	2
“Cさん”	90	2	2	3
“Fさん”	90	2	2	4
“Dさん”	80	5	3	5
“Eさん”	70	6	4	6

Explanation of Results

rank_val (RANK): There are 3 people with 90 points, so they are all 2nd place, and the next person with 80 points is 5th place (3rd and 4th places are skipped).
dense_rank_val (DENSE_RANK): All people with 90 points are 2nd place, but the next person with 80 points is 3rd place (ranks are not skipped).
row_num_val (ROW_NUMBER): Even if the scores are the same, a unique sequential number (2, 3, 4) is assigned (the order may depend on the DB implementation or reading order).

In this way, it is important to use the functions properly depending on the purpose (how to handle ties).

Summary

In this article, I introduced the differences in ranking using SQL Window functions.

RANK: Skips ranks (Olympic style)
DENSE_RANK: Does not skip ranks
ROW_NUMBER: Simple sequential number

By using these properly, the range of analysis will expand.