[SQL] Utilizing Self Join

In this article, I will explain SQL “Self Join”. A self join is, as the name suggests, an operation to “join with itself”.

Physically, it is one table, but by giving it an alias, you can logically treat it as “two different tables” and join them. It is very useful when you want to compare rows within the same table or when dealing with data with hierarchical structure.

Preparation

In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook. Also, I will use Polars to display dataframes.

github

  • The jupyter notebook file on github is here

google colaboratory

  • If you want to run it on google colaboratory, click here

Author’s Environment

Here is the author’s environment.

!sw_vers
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74
!python -V
Python 3.12.12

Import necessary libraries.

import sqlite3
import polars as pl

%load_ext sql
%config SqlMagic.feedback = True

%config SqlMagic.style = '_DEPRECATED_DEFAULT'

Data Preparation

As a sample, create an Employees table that represents the relationship between employees and their managers. The manager_id column contains the id of the employee’s manager.

import os

if os.path.exists("data.db"):
    os.remove("data.db")

os.system("touch data.db")
os.system("chmod 664 data.db")

# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Employees (
    id INTEGER PRIMARY KEY,
    name TEXT,
    manager_id INTEGER
);

INSERT INTO Employees (id, name, manager_id) VALUES
    (1, '佐藤社長', NULL),
    (2, '鈴木部長', 1),
    (3, '高橋課長', 2),
    (4, '田中係長', 2),
    (5, '伊藤社員', 3),
    (6, '渡辺社員', 3);
 * sqlite:///data.db
Done.
6 rows affected.





[]
%%sql
SELECT * FROM Employees;
 * sqlite:///data.db
Done.
idnamemanager_id
1佐藤社長None
2鈴木部長1
3高橋課長2
4田中係長2
5伊藤社員3
6渡辺社員3

Example: Display employee name and manager name

The Employees table only contains the manager’s id, so to know the manager’s name, you need to refer to the Employees table again.

Here, we use a self join. Join the Employees table by treating it as two tables: “subordinate table (E)” and “manager table (M)”.

query = """
SELECT
    E.name as Employee_Name,
    M.name as Manager_Name
FROM
    Employees as E
LEFT JOIN
    Employees as M
ON
    E.manager_id = M.id;
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (6, 2)

Employee_NameManager_Name
“佐藤社長”null
“鈴木部長”“佐藤社長”
“高橋課長”“鈴木部長”
“田中係長”“鈴木部長”
“伊藤社員”“高橋課長”
“渡辺社員”“高橋課長”

Explanation

  • FROM Employees as E: The first Employees table is designated as E (Employee).
  • LEFT JOIN Employees as M: The second Employees table is joined as M (Manager).
  • ON E.manager_id = M.id: Link the subordinate’s manager_id with the manager’s id.

This allows you to link information within the same table and extract it in a meaningful form (pairs of employee names and manager names). Since President Sato has no manager (NULL), the Manager_Name is also NULL as a result of LEFT JOIN.

Summary

In this article, I introduced Self Join.

  • By using aliases, you can treat one table in multiple roles (such as subordinate and manager).
  • It is an essential technique when dealing with hierarchical structures (organizational charts, category trees, etc.).

Although it may seem difficult at first glance, it is easier to understand if you imagine “joining copied tables together”.