Memory-based Collaborative Filtering
Overview
Memory-based collaborative filtering is a technique that makes recommendations based on past data of users or items.
There are mainly two methods: user-based and item-based, both of which are widely used in recommender systems.
This article details the definition, characteristics, and specific application examples of memory-based collaborative filtering using mathematical formulas and Python code.
This is a personal memorandum.
Source Code
GitHub
- Jupyter notebook files can be found here
Google Colaboratory
- To run on Google Colaboratory, click here
Execution Environment
The OS is macOS. Please note that options differ from Linux or Unix commands.
!sw_vers
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
!python -V
Python 3.9.17
We will import basic libraries and use watermark to check their versions. We will also set the seed for random numbers.
import random
import numpy as np
from pprint import pprint
seed = 123
random_state = 123
random.seed(seed)
np.random.seed(seed)
from watermark import watermark
print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version : 3.9.17
IPython version : 8.17.2
numpy : 1.25.2
matplotlib: 3.8.1
scipy : 1.11.2
pandas : 2.0.3
Watermark: 2.4.3
Definition of Memory-based Collaborative Filtering
Memory-based collaborative filtering is a technique that predicts future user behavior based on past user behavior data. There are mainly two methods:
- User-based Collaborative Filtering: A method that makes recommendations based on the behavior of similar users.
- Item-based Collaborative Filtering: A method that recommends items similar to those that the user has rated in the past.
User-based Collaborative Filtering
In user-based collaborative filtering, we first calculate the similarity between users. Then, using the ratings of users with high similarity, we make recommendations to the target user. Cosine similarity and Pearson correlation coefficient are used to calculate user similarity.
Cosine Similarity
If the rating vectors of users $u$ and $v$ are $\mathbf{r}_u$ and $\mathbf{r}_v$, respectively, cosine similarity is defined as follows:
$$ \text{cosine}(u, v) = \frac{\mathbf{r}_u \cdot \mathbf{r}_v}{|\mathbf{r}_u| |\mathbf{r}_v|} $$
Pearson Correlation Coefficient
The Pearson correlation coefficient is defined as follows:
$$ \text{pearson}(u, v) = \frac{\sum_{i \in I_{uv}} (r_{ui} - \overline{r}u)(r{vi} - \overline{r}v)}{\sqrt{\sum{i \in I_{uv}} (r_{ui} - \overline{r}u)^2} \sqrt{\sum{i \in I_{uv}} (r_{vi} - \overline{r}_v)^2}} $$
Here, $I_{uv}$ is the set of items rated by both users $u$ and $v$, and $\overline{r}_u$ and $\overline{r}_v$ are the average ratings of users $u$ and $v$, respectively.
Item-based Collaborative Filtering
In item-based collaborative filtering, we first calculate the similarity between items. Then, we recommend items similar to those the user has rated in the past. Cosine similarity and Pearson correlation coefficient are also used to calculate item similarity.
Implementation Example of Memory-based Collaborative Filtering
Below is an example implementation of user-based collaborative filtering using Python. Here, we calculate user similarity using cosine similarity and make recommendations.
Preparing the Dataset
First, we prepare an appropriate sample dataset. Here, we assume movie rating data similar to ml-100k.
# Create sample data
ratings_dict = {
"user_id": [1, 1, 1, 2, 2, 3, 3, 4],
"movie_id": [1, 2, 3, 2, 3, 1, 3, 1],
"rating": [2, 5, 1, 3, 1, 5, 4, 3],
}
ratings_df = pd.DataFrame(ratings_dict)
display(ratings_df)
user_id | movie_id | rating | |
---|---|---|---|
0 | 1 | 1 | 2 |
1 | 1 | 2 | 5 |
2 | 1 | 3 | 1 |
3 | 2 | 2 | 3 |
4 | 2 | 3 | 1 |
5 | 3 | 1 | 5 |
6 | 3 | 3 | 4 |
7 | 4 | 1 | 3 |
Calculating Similarity Between Users
Next, we
calculate cosine similarity between users.
By using the cosine_similarity function of scikit-learn, we can easily calculate the cosine similarity between users. Since each user rates each movie, we create a matrix where users are rows and movies are columns. This can be easily created using pivot_table.
from sklearn.metrics.pairwise import cosine_similarity
# Create user rating matrix
user_movie_ratings = ratings_df.pivot(index="user_id", columns="movie_id", values="rating").fillna(0)
display(user_movie_ratings)
movie_id | 1 | 2 | 3 |
---|---|---|---|
user_id | |||
1 | 2.0 | 5.0 | 1.0 |
2 | 0.0 | 3.0 | 1.0 |
3 | 5.0 | 0.0 | 4.0 |
4 | 3.0 | 0.0 | 0.0 |
# Calculate cosine similarity
user_similarities = cosine_similarity(user_movie_ratings)
# Convert to DataFrame
user_similarities_df = pd.DataFrame(user_similarities, index=user_movie_ratings.index, columns=user_movie_ratings.index)
display(user_similarities_df.round(2))
user_id | 1 | 2 | 3 | 4 |
---|---|---|---|---|
user_id | ||||
1 | 1.00 | 0.92 | 0.40 | 0.37 |
2 | 0.92 | 1.00 | 0.20 | 0.00 |
3 | 0.40 | 0.20 | 1.00 | 0.78 |
4 | 0.37 | 0.00 | 0.78 | 1.00 |
Making Recommendations
We will make recommendations to the target user based on the ratings of users with high similarity.
def recommend_movies(user_id, user_similarities_df, user_movie_ratings, num_recommendations=5):
similar_user_list = user_similarities_df[user_id].sort_values(ascending=False).index[1:]
user_ratings = user_movie_ratings.loc[user_id]
weighted_ratings = np.zeros(user_movie_ratings.shape[1])
for similar_user in similar_user_list:
similar_user_ratings = user_movie_ratings.loc[similar_user]
weight = user_similarities_df.loc[user_id, similar_user]
weighted_ratings += weight * similar_user_ratings
recommended_movies = np.argsort(weighted_ratings - user_ratings.values)[::-1]
return recommended_movies[:num_recommendations]
user_id = 1
recommendations = recommend_movies(user_id, user_similarities_df, user_movie_ratings)
print(f"Recommended movies for user {user_id}: {recommendations}")
Recommended movies for user 1: movie_id
3 2
2 0
1 1
Name: 2, dtype: int64
Application Examples
Memory-based collaborative filtering is applied in the following scenarios:
- Movie and Music Recommendations: Used in services like Netflix and Spotify.
- E-commerce Site Recommendations: Used in Amazon and Rakuten for recommending products to users.
- Friend Recommendations in Social Networks: Used in Facebook and LinkedIn for recommending friends.
Conclusion
In this article, we briefly described the definition, characteristics, and specific application examples of memory-based collaborative filtering using mathematical formulas and Python code.
Memory-based collaborative filtering is a simple yet powerful method for making recommendations based on past user data and is applied in many fields.
However, challenges such as sparse data and the cold start problem also exist. To solve these challenges, more advanced or hybrid methods need to be considered.