Classical Random Walk and Adjacency Matrix (PageRank and Personalized PageRank)

Overview

In this article, I will explain classical random walks and adjacency matrices, focusing on PageRank and Personalized PageRank. I will demonstrate the definitions and properties of these algorithms, along with concrete application examples using mathematical formulas and Python code, and explain how engineers in companies can utilize them.

Source Code

github

  • For the Jupyter notebook file, see here

google colaboratory

  • To run on Google Colaboratory, see here

Execution Environment

The OS is macOS. Note that the options may differ from Linux or Unix commands.

!sw_vers
ProductName:        macOS
ProductVersion:     13.5.1
BuildVersion:       22G90
!python -V
Python 3.9.17

Setting CSS for HTML tables to make pandas tables more readable.

Importing basic libraries and using watermark to check their versions. Also, setting the seed for random numbers.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import random

import scipy
import numpy as np

import matplotlib
import matplotlib.pyplot as plt

seed = 123
random_state = 123

random.seed(seed)
np.random.seed(seed)

from watermark import watermark

print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.17.2

matplotlib: 3.8.1
numpy     : 1.25.2
scipy     : 1.11.2

Watermark: 2.4.3

Classical Random Walk

Definition

A random walk is a process of randomly moving to the next node on a graph. The graph $\mathbf{G}$ consists of a set of nodes $\mathbf{V}$ and a set of edges $\mathbf{E}$. The set of edges is represented by the adjacency matrix $\mathbf{A}$. The element $\mathbf{A}_{ij}$ of the adjacency matrix $\mathbf{A}$ is 1 if there is an edge from node $i$ to node $j$, and 0 otherwise.

$$ \mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{bmatrix} $$

Transition Matrix of Random Walk

The transition matrix $\mathbf{P}$ of a random walk is defined based on the adjacency matrix $\mathbf{A}$. The element $\mathbf{P}_{ij}$ of the transition matrix $\mathbf{P}$ represents the transition probability from node $i$ to node $j$.

$$ \mathbf{P}_{ij} = \frac{a_{ij}}{\sum_{k} a_{ik}} $$

Properties of Random Walk

One important property of a random walk is that it converges to a steady state. After a sufficiently long time, the visit probabilities of the nodes converge to a constant value.

For the convergence of PageRank, refer to Google Matrix and PageRank .

PageRank

Mathematical Representation

PageRank is defined as follows. $ \mathbf{PR}(i) $ represents the PageRank of node $ i $.

$$ \mathbf{PR}(i) = \frac{1-d}{N} + d \sum_{j \in \mathbf{M}(i)} \frac{\mathbf{PR}(j)}{\mathbf{L}(j)} $$

Here, $\mathbf{M}(i)$ is the set of nodes linking to node $i$, $\mathbf{L}(j)$ is the number of links from node $j$, $d$ is the damping factor (usually 0.85), and $N$ is the total number of nodes.

Python Implementation Example

Below is an example of implementing PageRank using Python.

The steady state is derived using the iterative method.

import numpy as np

# Damping factor
d = 0.85

# Adjacency matrix
A = np.array([[0, 1, 1, 0], [0, 0, 1, 1], [1, 0, 0, 1], [1, 0, 1, 0]])

# Number of nodes
N = A.shape[0]

# Initial PageRank vector
PR = np.ones(N) / N

# Calculating the transition matrix
P = A / A.sum(axis=0)

# Iterative calculation of PageRank
# Finding the steady state using the iterative method
for _ in range(100):
    PR = (1 - d) / N + d * P.dot(PR)

print(np.round(PR, 2))
[0.29 0.21 0.26 0.24]

Personalized PageRank

Definition of Personalized PageRank

Personalized PageRank is an algorithm that calculates PageRank with an emphasis on specific nodes. It is useful for evaluating the importance of nodes related to specific users or items.

Mathematical Representation

Personalized PageRank is an extension of the basic PageRank formula. For a specific set of nodes $\mathbf{S}$, it is defined as follows.

$$ \mathbf{PPR}(i) = \frac{1-d}{N} + d \sum_{j \in \mathbf{M}(i)} \frac{\mathbf{PPR}(j)}{\mathbf{L}(j)} + \alpha \mathbf{e}_S(i) $$

Here, $\alpha$ is a parameter adjusting the degree of personalization, and $\mathbf{e}_S(i)$ is an indicator function showing whether node $i$ is included in the personalized set $\mathbf{S}$.

Python Implementation Example

Below is an example of implementing Personalized PageRank using Python.

The steady state is derived using the iterative method.

# Degree of personalization
alpha = 0.1

# Personalized set of nodes
S = {0}

# Initial PageRank vector
PPR = np.ones(N) / N

# Iterative calculation of Personalized PageRank
# Finding the steady state using the iterative method
for _ in range(100):
    PPR = (1 - d) / N + d * P.dot(PPR) + alpha * np.isin(np.arange(N), list(S))

print(np.round(PPR, 2))
[0.53 0.32 0.43 0.38]

Advantages and Disadvantages

Advantages

  1. Efficient Importance Evaluation: Random walk-based algorithms can efficiently evaluate the importance of nodes.
  2. Wide Applicability: PageRank and Personalized PageRank can be widely applied from web page ranking to recommender systems.

Disadvantages

  1. Computational Cost: The computational cost can be high for large-scale graphs.
  2. Parameter Adjustment: It is necessary to adjust the damping factor and degree of personalization.

Conclusion

In this article, I explained PageRank and Personalized PageRank using the concepts of classical random walks and adjacency matrices. These algorithms are very useful in web page ranking and recommender

systems. Through concrete implementation examples using Python, I demonstrated how they can be utilized by companies and researchers.

References