Memory-based Collaborative Filtering

Overview

Memory-based collaborative filtering is a technique that makes recommendations based on past data of users or items.

There are mainly two methods: user-based and item-based, both of which are widely used in recommender systems.

This article details the definition, characteristics, and specific application examples of memory-based collaborative filtering using mathematical formulas and Python code.

This is a personal memorandum.

Source Code

GitHub

  • Jupyter notebook files can be found here

Google Colaboratory

  • To run on Google Colaboratory, click here

Execution Environment

The OS is macOS. Please note that options differ from Linux or Unix commands.

!sw_vers
ProductName:		macOS
ProductVersion:		13.5.1
BuildVersion:		22G90
!python -V
Python 3.9.17

We will import basic libraries and use watermark to check their versions. We will also set the seed for random numbers.

import random
import numpy as np

from pprint import pprint

seed = 123
random_state = 123

random.seed(seed)
np.random.seed(seed)


from watermark import watermark

print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.17.2

numpy     : 1.25.2
matplotlib: 3.8.1
scipy     : 1.11.2
pandas    : 2.0.3

Watermark: 2.4.3

Definition of Memory-based Collaborative Filtering

Memory-based collaborative filtering is a technique that predicts future user behavior based on past user behavior data. There are mainly two methods:

  1. User-based Collaborative Filtering: A method that makes recommendations based on the behavior of similar users.
  2. Item-based Collaborative Filtering: A method that recommends items similar to those that the user has rated in the past.

User-based Collaborative Filtering

In user-based collaborative filtering, we first calculate the similarity between users. Then, using the ratings of users with high similarity, we make recommendations to the target user. Cosine similarity and Pearson correlation coefficient are used to calculate user similarity.

Cosine Similarity

If the rating vectors of users $u$ and $v$ are $\mathbf{r}_u$ and $\mathbf{r}_v$, respectively, cosine similarity is defined as follows:

$$ \text{cosine}(u, v) = \frac{\mathbf{r}_u \cdot \mathbf{r}_v}{|\mathbf{r}_u| |\mathbf{r}_v|} $$

Pearson Correlation Coefficient

The Pearson correlation coefficient is defined as follows:

$$ \text{pearson}(u, v) = \frac{\sum_{i \in I_{uv}} (r_{ui} - \overline{r}u)(r{vi} - \overline{r}v)}{\sqrt{\sum{i \in I_{uv}} (r_{ui} - \overline{r}u)^2} \sqrt{\sum{i \in I_{uv}} (r_{vi} - \overline{r}_v)^2}} $$

Here, $I_{uv}$ is the set of items rated by both users $u$ and $v$, and $\overline{r}_u$ and $\overline{r}_v$ are the average ratings of users $u$ and $v$, respectively.

Item-based Collaborative Filtering

In item-based collaborative filtering, we first calculate the similarity between items. Then, we recommend items similar to those the user has rated in the past. Cosine similarity and Pearson correlation coefficient are also used to calculate item similarity.

Implementation Example of Memory-based Collaborative Filtering

Below is an example implementation of user-based collaborative filtering using Python. Here, we calculate user similarity using cosine similarity and make recommendations.

Preparing the Dataset

First, we prepare an appropriate sample dataset. Here, we assume movie rating data similar to ml-100k.

# Create sample data
ratings_dict = {
    "user_id": [1, 1, 1, 2, 2, 3, 3, 4],
    "movie_id": [1, 2, 3, 2, 3, 1, 3, 1],
    "rating": [2, 5, 1, 3, 1, 5, 4, 3],
}

ratings_df = pd.DataFrame(ratings_dict)

display(ratings_df)
user_idmovie_idrating
0112
1125
2131
3223
4231
5315
6334
7413

Calculating Similarity Between Users

Next, we

calculate cosine similarity between users.

By using the cosine_similarity function of scikit-learn, we can easily calculate the cosine similarity between users. Since each user rates each movie, we create a matrix where users are rows and movies are columns. This can be easily created using pivot_table.

from sklearn.metrics.pairwise import cosine_similarity

# Create user rating matrix
user_movie_ratings = ratings_df.pivot(index="user_id", columns="movie_id", values="rating").fillna(0)

display(user_movie_ratings)
movie_id123
user_id
12.05.01.0
20.03.01.0
35.00.04.0
43.00.00.0
# Calculate cosine similarity
user_similarities = cosine_similarity(user_movie_ratings)

# Convert to DataFrame
user_similarities_df = pd.DataFrame(user_similarities, index=user_movie_ratings.index, columns=user_movie_ratings.index)

display(user_similarities_df.round(2))
user_id1234
user_id
11.000.920.400.37
20.921.000.200.00
30.400.201.000.78
40.370.000.781.00

Making Recommendations

We will make recommendations to the target user based on the ratings of users with high similarity.

def recommend_movies(user_id, user_similarities_df, user_movie_ratings, num_recommendations=5):
    similar_user_list = user_similarities_df[user_id].sort_values(ascending=False).index[1:]
    user_ratings = user_movie_ratings.loc[user_id]
    weighted_ratings = np.zeros(user_movie_ratings.shape[1])

    for similar_user in similar_user_list:
        similar_user_ratings = user_movie_ratings.loc[similar_user]
        weight = user_similarities_df.loc[user_id, similar_user]
        weighted_ratings += weight * similar_user_ratings

    recommended_movies = np.argsort(weighted_ratings - user_ratings.values)[::-1]
    return recommended_movies[:num_recommendations]


user_id = 1

recommendations = recommend_movies(user_id, user_similarities_df, user_movie_ratings)

print(f"Recommended movies for user {user_id}: {recommendations}")
Recommended movies for user 1: movie_id
3    2
2    0
1    1
Name: 2, dtype: int64

Application Examples

Memory-based collaborative filtering is applied in the following scenarios:

  • Movie and Music Recommendations: Used in services like Netflix and Spotify.
  • E-commerce Site Recommendations: Used in Amazon and Rakuten for recommending products to users.
  • Friend Recommendations in Social Networks: Used in Facebook and LinkedIn for recommending friends.

Conclusion

In this article, we briefly described the definition, characteristics, and specific application examples of memory-based collaborative filtering using mathematical formulas and Python code.

Memory-based collaborative filtering is a simple yet powerful method for making recommendations based on past user data and is applied in many fields.

However, challenges such as sparse data and the cold start problem also exist. To solve these challenges, more advanced or hybrid methods need to be considered.