[SQL] Difference between UNION and UNION ALL
In this article, I will explain the difference between SQL UNION and UNION ALL.
Both are operators that combine the results of multiple SELECT statements into one result set, but there are significant differences in behavior and performance.
To conclude, you should use UNION ALL unless you need to eliminate duplicates.
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
As a sample, create student rosters for two classes, ClassA and ClassB.
Assume that “Sato-san” exists in both classes.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE ClassA (
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE ClassB (
id INTEGER PRIMARY KEY,
name TEXT
);
INSERT INTO ClassA (name) VALUES
('佐藤さん'),
('鈴木さん'),
('高橋さん');
INSERT INTO ClassB (name) VALUES
('佐藤さん'),
('田中さん'),
('伊藤さん');
* sqlite:///data.db
Done.
Done.
3 rows affected.
3 rows affected.
[]
%%sql
SELECT * FROM ClassA;
* sqlite:///data.db
Done.
| id | name |
|---|---|
| 1 | 佐藤さん |
| 2 | 鈴木さん |
| 3 | 高橋さん |
%%sql
SELECT * FROM ClassB;
* sqlite:///data.db
Done.
| id | name |
|---|---|
| 1 | 佐藤さん |
| 2 | 田中さん |
| 3 | 伊藤さん |
UNION: Combine with duplicate elimination
When using UNION, after combining two result sets, duplicate rows are removed.
query = """
SELECT name FROM ClassA
UNION
SELECT name FROM ClassB;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (5, 1)
| name |
|---|
| “伊藤さん” |
| “佐藤さん” |
| “田中さん” |
| “鈴木さん” |
| “高橋さん” |
“Sato-san”, who is in both classes, is displayed only once. This is because internally, sorting and hashing are performed to remove duplicates.
UNION ALL: Display duplicates as is
When using UNION ALL, duplicates are not removed, and the results are simply concatenated vertically.
query = """
SELECT name FROM ClassA
UNION ALL
SELECT name FROM ClassB;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (6, 1)
| name |
|---|
| “佐藤さん” |
| “鈴木さん” |
| “高橋さん” |
| “佐藤さん” |
| “田中さん” |
| “伊藤さん” |
“Sato-san” is displayed twice.
Since the duplicate removal process does not run, it operates faster than UNION.
Summary
- UNION: Removes duplicates (slow)
- UNION ALL: Does not remove duplicates (fast)
If duplicates are not a problem, or if you know there are no duplicates in the first place, always use UNION ALL for performance reasons.