On Matrix Factorization in Recommender Systems
Overview
In this article, I will explain the method of matrix factorization in recommender systems.
I will illustrate the definition, properties, and applications of matrix factorization using concrete formulas and Python code.
I will also discuss the advantages and disadvantages of matrix factorization, and introduce an implementation example using the “movielens-100k” dataset.
Source Code
github
- The Jupyter notebook file is available here
google colaboratory
- To run on Google Colaboratory, click here
Execution Environment
The OS is macOS. Note that the options differ from Linux or Unix commands.
!sw_vers
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
!python -V
Python 3.9.17
I will import the basic libraries and use watermark to check their versions. Additionally, I will set the seed for random numbers.
import random
import pandas as pd
import numpy as np
seed = 123
random_state = 123
random.seed(seed)
np.random.seed(seed)
from watermark import watermark
print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version : 3.9.17
IPython version : 8.17.2
numpy : 1.25.2
pandas: 2.0.3
Watermark: 2.4.3
Definition and Properties of Matrix Factorization
Matrix factorization is a method that decomposes a given matrix into two low-rank matrices. In recommender systems, it extracts latent factors by decomposing the user-item matrix to make recommendations.
Mathematical Representation
The rating matrix of user and item is decomposed as follows:
Here, is the user matrix, and is the item matrix. The dimensions of each matrix are as follows:
where is the number of users, is the number of items, and is the dimension of the latent factors. The goal is to minimize the sum of squared differences between the rating matrix and the predicted matrix .
Optimization Problem
The optimization problem is formulated as follows:
Here, is the set of user-item pairs with ratings, and is the regularization parameter. This regularization term prevents overfitting.
Applications and Implementation
Matrix factorization is applied in various recommender systems. Below is an implementation example using the “Movielens-100k” dataset.
Preparing the Dataset
First, load the “Movielens-100k” dataset and prepare the rating matrix. Assume the dataset is stored in the ml-100k directory.
from scipy.sparse.linalg import svds
# Load the dataset
ratings = pd.read_csv("./ml-100k/u.data", sep="\t", header=None, names=["user_id", "movie_id", "rating", "timestamp"])
ratings = ratings.pivot(index="user_id", columns="movie_id", values="rating").fillna(0)
# Create the rating matrix
R = ratings.values
user_ratings_mean = np.mean(R, axis=1)
R_demeaned = R - user_ratings_mean.reshape(-1, 1)
Implementation of Matrix Factorization
Next, decompose the rating matrix using SVD (Singular Value Decomposition).
# Singular Value Decomposition
U, sigma, Vt = svds(R_demeaned, k=50)
sigma = np.diag(sigma)
# Create the predicted rating matrix
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)
predicted_ratings = pd.DataFrame(all_user_predicted_ratings, columns=ratings.columns)
# Recommend movies for user 1
user_id = 1
user_row_number = user_id - 1 # Row number starts from 0
sorted_user_predictions = predicted_ratings.iloc[user_row_number].sort_values(ascending=False)
# Display the original and predicted ratings
user_data = ratings.loc[user_id]
user_full = pd.concat([user_data, sorted_user_predictions], axis=1)
user_full.columns = ["Original Rating", "Predicted Rating"]
display(user_full.head(5))
Original Rating | Predicted Rating | |
---|---|---|
movie_id | ||
1 | 5.0 | 6.488436 |
2 | 3.0 | 2.959503 |
3 | 4.0 | 1.634987 |
4 | 3.0 | 3.024467 |
5 | 3.0 | 1.656526 |
Conclusion
In this article, I detailed matrix factorization (Singular Value Decomposition) in recommender systems.
I presented specific definitions, formulas, and examples using Python code, and discussed the advantages and disadvantages.
This method is applied in many recommender systems, achieving high-accuracy recommendations.
References
- Wikipedia: Matrix Factorization