[SQL] How to Use EXISTS and IN

In this article, I will explain the difference between SQL EXISTS and IN. Both are often used to extract “rows that meet certain conditions” using subqueries, but there are differences in behavior and performance.

In particular, let’s look at how to use them from the perspectives of NULL handling and performance.

Preparation

In this site, I will use python libraries sqlite3 and ipython-sql to execute and explain SQL on Jupyter Notebook. Also, I will use Polars to display dataframes.

github

  • The jupyter notebook file on github is here

google colaboratory

  • If you want to run it on google colaboratory, click here

Author’s Environment

Here is the author’s environment.

!sw_vers
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74
!python -V
Python 3.12.12

Import necessary libraries.

import sqlite3
import polars as pl

%load_ext sql
%config SqlMagic.feedback = True

%config SqlMagic.style = '_DEPRECATED_DEFAULT'

Data Preparation

As a sample, create a customer table Customers and an order table Orders.

import os

if os.path.exists("data.db"):
    os.remove("data.db")

os.system("touch data.db")
os.system("chmod 664 data.db")

# Database connection
%sql sqlite:///data.db
'Connected: @data.db'
%%sql
CREATE TABLE Customers (
    id INTEGER PRIMARY KEY,
    name TEXT
);

CREATE TABLE Orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    amount INTEGER
);

INSERT INTO Customers (name) VALUES
    ('Aさん'),
    ('Bさん'),
    ('Cさん'),
    ('Dさん');

INSERT INTO Orders (customer_id, amount) VALUES
    (1, 100),
    (1, 200),
    (3, 500);
 * sqlite:///data.db
Done.
Done.
4 rows affected.
3 rows affected.





[]
%%sql
SELECT * FROM Customers;
 * sqlite:///data.db
Done.
idname
1Aさん
2Bさん
3Cさん
4Dさん
%%sql
SELECT * FROM Orders;
 * sqlite:///data.db
Done.
idcustomer_idamount
11100
21200
33500

Task: Extract customers with order history

A-san and C-san have order history, but B-san and D-san do not. Let’s extract only customers with order history.

Method 1: Using IN

Use the IN clause to extract customers included in the list of customer_id in the Orders table.

query = """
SELECT
    *
FROM
    Customers
WHERE
    id IN (SELECT customer_id FROM Orders);
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (2, 2)

idname
1“Aさん”
3“Cさん”

Method 2: Using EXISTS

Use the EXISTS clause to check whether there are related rows in the Orders table.

query = """
SELECT
    *
FROM
    Customers c
WHERE
    EXISTS (
        SELECT
            1
        FROM
            Orders o
        WHERE
            c.id = o.customer_id
    );
"""

with sqlite3.connect("data.db") as conn:
    df = pl.read_database(query, connection=conn)
    display(df)

shape: (2, 2)

idname
1“Aさん”
3“Cさん”

Explanation and Differences

The results are the same, but there are the following differences.

  1. NULL handling

    • IN can behave unintuitive when NULL is included in the list, such as when the result of NOT IN becomes all unknown (empty).
    • EXISTS only looks at the existence of rows, so it is less affected by NULL.
  2. Performance

    • IN often retrieves all subquery results before comparing.
    • EXISTS stops searching as soon as it finds even one row that meets the condition (returns True), so it tends to be faster especially when there are a large number of subquery results.

Summary

  • Basically, it is safer to use EXISTS.
  • Especially when using NOT IN, it is strongly recommended to rewrite it to NOT EXISTS to avoid unexpected behavior due to NULL.