[SQL] SQL Execution Environment in Python and Basic SELECT
I will note down SQL (Structured Query Language) techniques, which are essential for data analysis and system development, for my own reference with actual examples.
In this article, I will explain how to build an environment to execute and verify SQL on Python, which will be used in this series, and the most basic SELECT statement.
In this series, I will use SQLite, which is lightweight and easy to handle, and Polars, a fast dataframe library, in combination.
Preparation
In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook.
Also, I will use Polars to display dataframes.
github
- The jupyter notebook file on github is here
google colaboratory
- If you want to run it on google colaboratory, click here
Author’s Environment
Here is the author’s environment.
!sw_vers
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
!python -V
Python 3.12.12
Import necessary libraries.
import sqlite3
import polars as pl
%load_ext sql
%config SqlMagic.feedback = True
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Data Preparation
First, create a database file (SQLite) for executing SQL, and create a Users table as sample data.
import os
if os.path.exists("data.db"):
os.remove("data.db")
os.system("touch data.db")
os.system("chmod 664 data.db")
# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Users (
id INTEGER PRIMARY KEY,
name TEXT,
age INTEGER,
country TEXT
);
INSERT INTO Users (name, age, country) VALUES
('Alice', 25, 'USA'),
('Bob', 30, 'UK'),
('Charlie', 35, 'USA'),
('Dave', 20, 'Japan'),
('Eve', 28, 'Japan');
* sqlite:///data.db
Done.
5 rows affected.
[]
SQL Execution and Result Confirmation
In this series, I adopt a method to receive SQL execution results as a Polars DataFrame and display them beautifully.
By connecting with sqlite3 and using pl.read_database, you can directly convert SQL results into a DataFrame.
Basic SELECT
First, let’s retrieve all the contents of the table.
query = """
SELECT * FROM Users;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (5, 4)
| id | name | age | country |
|---|---|---|---|
| 1 | “Alice” | 25 | “USA” |
| 2 | “Bob” | 30 | “UK” |
| 3 | “Charlie” | 35 | “USA” |
| 4 | “Dave” | 20 | “Japan” |
| 5 | “Eve” | 28 | “Japan” |
Condition Specification (WHERE) and Sorting (ORDER BY)
Next, let’s specify conditions to filter data and sort it.
Example: Retrieve users whose country is ‘Japan’, in descending order of age (largest first).
query = """
SELECT
name,
age,
country
FROM
Users
WHERE
country = 'Japan'
ORDER BY
age DESC;
"""
with sqlite3.connect("data.db") as conn:
df = pl.read_database(query, connection=conn)
display(df)
shape: (2, 3)
| name | age | country |
|---|---|---|
| “Eve” | 28 | “Japan” |
| “Dave” | 20 | “Japan” |
Summary
In this article, I introduced the setup of the SQL execution environment used in this series and how to execute basic queries.
- SQLite: A file-based database that is easy to use.
- Polars: A fast and easy-to-use dataframe library. Also useful for displaying SQL results.
From the next time onwards, I will explain more practical SQL techniques (Window functions, CTEs, joins, etc.) using this environment.