How to Use Surprise
Overview
Python library “Surprise” is designed to streamline the development of recommender systems. In this blog, I will explain the definitions and properties of Surprise, and provide concrete examples of its application using equations and Python code. This is mainly a memo for myself, so please refer to the official documentation.
Source Code
GitHub
- The Jupyter notebook file can be found here .
Google Colaboratory
- To run on Google Colaboratory, click here .
Execution Environment
The OS is macOS. Note that the options for Linux and Unix commands may differ.
!sw_vers
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
!python -V
Python 3.9.17
First, import the basic libraries and use watermark to check their versions. Additionally, set the seed for random numbers.
import random
import scipy
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
seed = 123
random_state = 123
random.seed(seed)
np.random.seed(seed)
from watermark import watermark
print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version : 3.9.17
IPython version : 8.17.2
numpy : 1.25.2
scipy : 1.11.2
matplotlib: 3.8.1
Watermark: 2.4.3
Definition and Properties of Surprise
What is Surprise?
Surprise is a Python library that is easily customizable and allows the construction of recommender systems according to user preferences. Surprise supports various algorithms (such as matrix factorization and k-nearest neighbors), enabling the implementation of effective recommender systems.
Features of Surprise
- Diverse Algorithms: Supports a variety of recommender algorithms including matrix factorization, k-nearest neighbors, and baseline estimation.
- Flexible Datasets: Users can load and use their datasets in addition to the built-in datasets.
- Easy Evaluation: Rich in evaluation methods such as cross-validation and grid search.
Basics of Recommender Systems
Recommender systems are systems that model the relationship between users and items. They are primarily divided into two types: collaborative filtering and content-based filtering.
Collaborative Filtering
Collaborative filtering is a method of making recommendations based on user behavior data. There are mainly two approaches:
- User-Based Collaborative Filtering: Finds similar users and recommends items rated by those users.
- Item-Based Collaborative Filtering: Finds similar items and recommends other items rated by users who have rated those items.
$$ \mathbf{R} = \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ r_{21} & r_{22} & \cdots & r_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ r_{m1} & r_{m2} & \cdots & r_{mn} \\ \end{bmatrix} $$
Here, $\mathbf{R}$ is the user-item rating matrix, and $r_{ij}$ is the rating given by user $i$ to item $j$.
Content-Based Filtering
Content-based filtering is a method of making recommendations based on item attribute information. For example, it uses information such as movie genres and actors to recommend movies that match the user’s preferences.
Applications of Surprise
Recommender Systems Using Matrix Factorization
Matrix factorization is a technique that decomposes the rating matrix $\mathbf{R}$ into a user feature matrix $\mathbf{P}$ and an item feature matrix $\mathbf{Q}$. This allows for low-rank approximation and recommendations.
$$ \mathbf{R} \approx \mathbf{P} \mathbf{Q}^T $$
When using matrix factorization in Surprise, algorithms such as SVD (Singular Value Decomposition) and NMF (Non-Negative Matrix Factorization) are available.
Implementation Example in Python
Below is a specific example using SVD.
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
# Use the ml-100k dataset
data = Dataset.load_builtin("ml-100k")
# Split the data into training and testing sets
train_data, test_data = train_test_split(data, test_size=0.1)
from surprise import SVD
# Apply the SVD algorithm
algo = SVD()
algo.fit(train_data)
predictions = algo.test(test_data)
# Evaluate the accuracy
print(f"RMSE: {accuracy.rmse(predictions, verbose=False):.3f}")
RMSE: 0.927
Recommender Systems Using k-Nearest Neighbors
k-Nearest Neighbors (k-NN) is a method of making recommendations based on the similarity of users or items. In Surprise, you can implement user-based or item-based k-NN.
Implementation Example in Python
Below is a specific example using item-based k-NN.
from surprise import KNNBasic
# Apply the item-based k-NN algorithm
algo = KNNBasic(k=30, sim_options={"user_based": False}, verbose=False)
algo.fit(train_data)
predictions = algo.test(test_data, verbose=False)
# Evaluate the accuracy
print(f"RMSE: {accuracy.rmse(predictions, verbose=False):.3f}")
RMSE: 0.973
Conclusion
In this article, I provided a brief explanation of the Python library Surprise and introduced its usage. Using Surprise simplifies the development of recommender systems and allows you to try various algorithms. Specifically, I showed implementation examples using matrix factorization and k-nearest neighbors, demonstrating their usage through actual code.
References
- Surprise Documentation: https://surprise.readthedocs.io/en/stable/