Classical Random Walk (DeepWalk and Node2Vec)

Overview

In this article, I will explain the definitions, properties, and applications of “Classical Random Walk (DeepWalk and Node2Vec).” By exploring the mathematical background, detailed algorithms, and concrete Python implementation examples of DeepWalk and Node2Vec, which are graph embedding techniques using random walks, I will explain their usage in recommender systems and other applications, along with their advantages and disadvantages.

Source Code

github

  • For the Jupyter notebook file, see here

google colaboratory

  • To run on Google Colaboratory, see here

Execution Environment

The OS is macOS. Note that the options may differ from Linux or Unix commands.

!sw_vers
ProductName:        macOS
ProductVersion:     13.5.1
BuildVersion:       22G90
!python -V
Python 3.9.17

Setting CSS for HTML tables to make pandas tables more readable.

Importing basic libraries and using watermark to check their versions. Also, setting the seed for random numbers.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import random

import scipy
import numpy as np

import matplotlib
import matplotlib.pyplot as plt

seed = 123
random_state = 123

random.seed(seed)
np.random.seed(seed)

from watermark import watermark

print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.17.2

numpy     : 1.25.2
matplotlib: 3.8.1
scipy     : 1.11.2

Watermark: 2.4.3

What is Classical Random Walk?

Definition

Classical random walk is a type of random walk in graph theory. Starting from a node on the graph, the process repeatedly moves to a neighboring node at random. This process is probabilistic, and the move to the next node is decided randomly.

The transition matrix $\mathbf{P}$ of a random walk is defined as follows:

$$ \mathbf{P}_{ij} = \begin{cases} \frac{1}{d_i} & \text{if } (i, j) \in E \\ 0 & \text{otherwise} \end{cases} $$

Here, $d_i$ is the degree of node $i$.

Properties

  • Ergodicity: A sufficiently long random walk covers the entire graph.
  • Steady-State Distribution: Over a long period, the random walk converges to a steady-state distribution.
  • Markov Property: The next state depends only on the current state and not on the previous states.

DeepWalk

Definition

DeepWalk is a technique for learning graph node embeddings based on random walks. Specifically, it performs random walks on the graph and uses the resulting node sequences to learn node embeddings using the Skip-gram model.

Algorithm

  1. Generating Random Walks: Start from each node and generate multiple walks by randomly moving to neighboring nodes.
  2. Learning with Skip-gram Model: Use the generated node sequences as input to the Skip-gram model to learn node embeddings.

$$ J = - \sum_{w \in V} \sum_{u \in N(w)} \log P(u|w) $$

Python Implementation Example

Below is a basic implementation example of DeepWalk:

Using the karate_club_graph provided in networkx as data.

import numpy as np
import networkx as nx

from pprint import pprint

from gensim.models import Word2Vec

def random_walk(graph, start_node, walk_length):
    walk_list = [start_node]
    for _ in range(walk_length - 1):
        neighbor_list = list(graph.neighbors(walk_list[-1]))
        if neighbor_list:
            walk_list.append(np.random.choice(neighbor_list))
        else:
            break
    return walk_list

def generate_walk_list(graph, num_walk_list, walk_length):
    walk_list = []
    node_list = list(graph.nodes())
    for _ in range(num_walk_list):
        np.random.shuffle(node_list)
        for node in node_list:
            walk_list.append(random_walk(graph, node, walk_length))
    return walk_list

# Creating the graph
G = nx.karate_club_graph()

# Generating random walks
num_walks = 10
walk_length = 5
walks = generate_walk_list(G, num_walks, walk_length)

# Learning with the Skip-gram model
model = Word2Vec(walks, vector_size=64, window=5, min_count=0, sg=1)

# Displaying the node embeddings
embeddings = {node: model.wv[node] for node in G.nodes()}

# Displaying only the learning results of the first node
pprint(embeddings[0].round(4))
array([ 0.0003,  0.0038,  0.0068,  0.0115, -0.0065, -0.0014,  0.0095,
        0.0189, -0.002 , -0.0164,  0.0105, -0.0006,  0.0142, -0.0015,
       -0.0071, -0.0128, -0.0034,  0.0057,  0.0067,  0.0124, -0.0037,
        0.0052,  0.0179, -0.0091, -0.0047,  0.0161, -0.0006,  0.0006,
        0.0049, -0.0007,  0.0136,  0.0106, -0.0222, -0.0166,  0.0024,
        0.0117, -0.0107,  0.011 ,  0.0167,  0.0078,  0.0132, -0.0059,
        0.0016,  0.0041, -0.004 , -0.0126, -0.0022, -0.0154, -0.0094,
        0.0114,  0.0081,  0.0115,  0.0103,  0.018 ,  0.0006,  0.0024,
        0.0083, -0.0101,  0.0099, -0.0185,  0.012 ,  0.0013, -0.0087,
        0.008 ], dtype=float32)

Node2Vec

Definition

Node2Vec is an extension of DeepWalk that balances depth-first search (DFS) and breadth-first search (BFS) during random walks to learn node embeddings.

Algorithm

  1. Generating Random Walks: Introduce parameters $p$ and $q$ to balance DFS and BFS, and generate random walks.
  2. Learning with Skip-gram Model: Like DeepWalk, use the generated node sequences as input to the Skip-gram model to learn node embeddings.

$$ J = - \sum_{w \in V} \sum_{u \in N(w)} \log P(u|w) $$

Python Implementation Example

Below is a basic implementation example of Node2Vec:

Using the karate_club_graph provided in networkx as data.

import numpy as np
import networkx as nx

from pprint import pprint

from gensim.models import Word2Vec

class Node2Vec:
    def __init__(self, graph, p, q, num_walk_list, walk_length):
        self.graph = graph
        self.p = p
        self.q = q
        self.num_walk_list = num_walk_list
        self.walk_length = walk_length

    def node2vec_walk(self, start_node):
        walk = [start_node]
        while len(walk) < self.walk_length:
            cur = walk[-1]
            cur_nbrs = list(self.graph.neighbors(cur))
            if len(cur_nbrs) > 0:
                if len(walk) == 1:
                    walk.append(np.random.choice(cur_nbrs))
                else:
                    prev = walk[-2]
                    probs = []
                    for nbr in cur_nbrs:
                        if nbr == prev:
                            probs.append(1 / self.p)
                        elif self.graph.has_edge(prev, nbr):
                            probs.append(1)
                        else:
                            probs.append

(1 / self.q)
                    probs = np.array(probs)
                    probs = probs / probs.sum()
                    walk.append(np.random.choice(cur_nbrs, p=probs))
            else:
                break
        return walk

    def generate_walk_list(self):
        walk_list = []
        node_list = list(self.graph.nodes())
        for _ in range(self.num_walk_list):
            np.random.shuffle(node_list)
            for node in node_list:
                walk_list.append(self.node2vec_walk(node))
        return walk_list

# Creating the graph
G = nx.karate_club_graph()

# Setting Node2Vec parameters
p = 1
q = 1
num_walks = 10
walk_length = 5

# Creating an instance of Node2Vec and generating walks
node2vec = Node2Vec(G, p, q, num_walks, walk_length)
walks = node2vec.generate_walk_list()

# Learning with the Skip-gram model
model = Word2Vec(walks, vector_size=64, window=5, min_count=0, sg=1)

# Displaying the node embeddings
embedding_list = {node: model.wv[node] for node in G.nodes()}

# Displaying only the results of the first node
pprint(embedding_list[0].round(4))
array([-0.0029,  0.0041,  0.0041,  0.0096, -0.005 , -0.0021,  0.0089,
        0.017 , -0.0034, -0.0142,  0.0112,  0.0022,  0.0132, -0.0026,
       -0.0064, -0.0133, -0.0026,  0.0073,  0.0091,  0.0139, -0.0057,
        0.0032,  0.0166, -0.0085, -0.004 ,  0.0147, -0.0019,  0.0005,
        0.0063, -0.0011,  0.0124,  0.0105, -0.021 , -0.0161,  0.0038,
        0.0123, -0.0114,  0.01  ,  0.0152,  0.0083,  0.0139, -0.0052,
       -0.0002,  0.0031, -0.0029, -0.0106, -0.0021, -0.0151, -0.01  ,
        0.0139,  0.0078,  0.0118,  0.012 ,  0.0159, -0.0003,  0.0037,
        0.0061, -0.0099,  0.0123, -0.0165,  0.0114,  0.002 , -0.0098,
        0.0085], dtype=float32)

Applications

Use in Recommender Systems

  • Modeling User-Item Relationships: Treat users and items as nodes, perform random walks based on user behavior history, and learn user preferences as embedding vectors.
  • Social Network Analysis: Represent user relationships as a graph and use random walks to model the strength of relationships.

Advantages

  • Scalability: Efficiently computable even for large-scale graphs.
  • Flexibility: Can balance the depth and breadth of exploration by adjusting parameters $p$ and $q$.

Disadvantages

  • Computational Cost: The computational cost may increase for very large graphs.
  • Hyperparameter Tuning: Proper selection of hyperparameters significantly affects the results.

Conclusion

In this article, I explained the definitions, properties, and applications of Classical Random Walk, DeepWalk, and Node2Vec. It was shown that graph embedding techniques using random walks are effective in recommender systems and social network analysis. Through concrete Python code examples, the implementation process should have been understood as well.

References

  • Perozzi, A., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘14).
  • Grover, A., & Leskovec, J. (2016). node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16).