[SQL] String Concatenation by Group (GROUP_CONCAT)
In this article, I will explain the SQL GROUP_CONCAT function.
This is an aggregate function that concatenates strings from rows grouped by GROUP BY into one string, such as comma-separated.
It is very convenient in cases such as “I want to display multiple tags associated with one article in one line”.
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
As a sample, create a post table Posts and a tag table Tags.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Posts (
id INTEGER PRIMARY KEY,
title TEXT
);
CREATE TABLE Tags (
post_id INTEGER,
tag_name TEXT
);
INSERT INTO Posts (id, title) VALUES
(1, 'SQL入門'),
(2, 'Pythonデータ分析');
INSERT INTO Tags (post_id, tag_name) VALUES
(1, 'Database'),
(1, 'Beginner'),
(1, 'SQL'),
(2, 'Python'),
(2, 'DataScience'),
(2, 'Pandas');
* sqlite:///data.db
Done.
Done.
2 rows affected.
6 rows affected.
[]
%%sql
SELECT * FROM Posts;
* sqlite:///data.db
Done.
| id | title |
|---|---|
| 1 | SQL入門 |
| 2 | Pythonデータ分析 |
%%sql
SELECT * FROM Tags;
* sqlite:///data.db
Done.
| post_id | tag_name |
|---|---|
| 1 | Database |
| 1 | Beginner |
| 1 | SQL |
| 2 | Python |
| 2 | DataScience |
| 2 | Pandas |
Task: Display tags separated by commas for each post
If you JOIN normally, the number of rows increases by the number of tags.
I want to consolidate this into one row per post.
query = """
SELECT
p.title,
GROUP_CONCAT(t.tag_name, ', ') as tags
FROM
Posts p
JOIN
Tags t ON p.id = t.post_id
GROUP BY
p.id;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (2, 2)
| title | tags |
|---|---|
| “SQL入門” | “Database, Beginner, SQL” |
| “Pythonデータ分析” | “Python, DataScience, Pandas” |
Explanation
GROUP_CONCAT(column name, delimiter): Concatenates the values of the specified column with the specified delimiter.GROUP BY p.id: Group by post.
This allows you to express multiple values like a list in one row.
Note: Differences by Database
String concatenation functions have different names and syntax depending on the database product.
- MySQL / SQLite:
GROUP_CONCAT - PostgreSQL:
STRING_AGG - Oracle:
LISTAGG
Check the documentation of the database you are using.
Summary
- Using GROUP_CONCAT, you can concatenate values within a group into one string.
- It is very convenient when formatting data for report output or display on the application side.