# How to Use Surprise

## Overview

Python library “Surprise” is designed to streamline the development of recommender systems. In this blog, I will explain the definitions and properties of Surprise, and provide concrete examples of its application using equations and Python code. This is mainly a memo for myself, so please refer to the official documentation.

## Source Code

### GitHub

- The Jupyter notebook file can be found here .

### Google Colaboratory

- To run on Google Colaboratory, click here .

## Execution Environment

The OS is macOS. Note that the options for Linux and Unix commands may differ.

```
!sw_vers
```

```
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
```

```
!python -V
```

```
Python 3.9.17
```

First, import the basic libraries and use watermark to check their versions. Additionally, set the seed for random numbers.

```
import random
import scipy
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
seed = 123
random_state = 123
random.seed(seed)
np.random.seed(seed)
from watermark import watermark
print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
```

```
Python implementation: CPython
Python version : 3.9.17
IPython version : 8.17.2
numpy : 1.25.2
scipy : 1.11.2
matplotlib: 3.8.1
Watermark: 2.4.3
```

## Definition and Properties of Surprise

### What is Surprise?

Surprise is a Python library that is easily customizable and allows the construction of recommender systems according to user preferences. Surprise supports various algorithms (such as matrix factorization and k-nearest neighbors), enabling the implementation of effective recommender systems.

### Features of Surprise

**Diverse Algorithms**: Supports a variety of recommender algorithms including matrix factorization, k-nearest neighbors, and baseline estimation.**Flexible Datasets**: Users can load and use their datasets in addition to the built-in datasets.**Easy Evaluation**: Rich in evaluation methods such as cross-validation and grid search.

## Basics of Recommender Systems

Recommender systems are systems that model the relationship between users and items. They are primarily divided into two types: collaborative filtering and content-based filtering.

### Collaborative Filtering

Collaborative filtering is a method of making recommendations based on user behavior data. There are mainly two approaches:

**User-Based Collaborative Filtering**: Finds similar users and recommends items rated by those users.**Item-Based Collaborative Filtering**: Finds similar items and recommends other items rated by users who have rated those items.

$$ \mathbf{R} = \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ r_{21} & r_{22} & \cdots & r_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ r_{m1} & r_{m2} & \cdots & r_{mn} \\ \end{bmatrix} $$

Here, $\mathbf{R}$ is the user-item rating matrix, and $r_{ij}$ is the rating given by user $i$ to item $j$.

### Content-Based Filtering

Content-based filtering is a method of making recommendations based on item attribute information. For example, it uses information such as movie genres and actors to recommend movies that match the user’s preferences.

## Applications of Surprise

### Recommender Systems Using Matrix Factorization

Matrix factorization is a technique that decomposes the rating matrix $\mathbf{R}$ into a user feature matrix $\mathbf{P}$ and an item feature matrix $\mathbf{Q}$. This allows for low-rank approximation and recommendations.

$$ \mathbf{R} \approx \mathbf{P} \mathbf{Q}^T $$

When using matrix factorization in Surprise, algorithms such as SVD (Singular Value Decomposition) and NMF (Non-Negative Matrix Factorization) are available.

#### Implementation Example in Python

Below is a specific example using SVD.

```
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
# Use the ml-100k dataset
data = Dataset.load_builtin("ml-100k")
# Split the data into training and testing sets
train_data, test_data = train_test_split(data, test_size=0.1)
```

```
from surprise import SVD
# Apply the SVD algorithm
algo = SVD()
algo.fit(train_data)
predictions = algo.test(test_data)
# Evaluate the accuracy
print(f"RMSE: {accuracy.rmse(predictions, verbose=False):.3f}")
```

```
RMSE: 0.927
```

### Recommender Systems Using k-Nearest Neighbors

k-Nearest Neighbors (k-NN) is a method of making recommendations based on the similarity of users or items. In Surprise, you can implement user-based or item-based k-NN.

#### Implementation Example in Python

Below is a specific example using item-based k-NN.

```
from surprise import KNNBasic
# Apply the item-based k-NN algorithm
algo = KNNBasic(k=30, sim_options={"user_based": False}, verbose=False)
algo.fit(train_data)
predictions = algo.test(test_data, verbose=False)
# Evaluate the accuracy
print(f"RMSE: {accuracy.rmse(predictions, verbose=False):.3f}")
```

```
RMSE: 0.973
```

## Conclusion

In this article, I provided a brief explanation of the Python library Surprise and introduced its usage. Using Surprise simplifies the development of recommender systems and allows you to try various algorithms. Specifically, I showed implementation examples using matrix factorization and k-nearest neighbors, demonstrating their usage through actual code.

## References

- Surprise Documentation: https://surprise.readthedocs.io/en/stable/