[Graph Neural Networks] Implementation of GNN and GCN

Theory and Implementation of GNN (Graph Neural Network) and GCN (Graph Convolutional Network)

GNN (Graph Neural Network) is a framework for processing graph-structured data. GCN (Graph Convolutional Network) is a specific type of GNN that uses graph convolution operations. This article summarizes the basic theories and implementations of GNN and GCN as personal notes.

Source Code

GitHub

The Jupyter notebook file can be found here .

Google Colaboratory

To run on Google Colaboratory, click here .

Execution Environment

The operating system used is macOS. Note that the options may differ from Linux or Unix commands.

!sw_vers

ProductName:		macOS
ProductVersion:		15.2
BuildVersion:		24C101

!python -V

Python 3.9.17

<style>
    .dataframe thead tr:only-child th {
        text-align: right;
    }

    .dataframe thead th {
        text-align: left;
        padding: 5px;
    }

    .dataframe tbody tr th {
        vertical-align: top;
        padding: 5px;
    }

    .dataframe tbody tr:hover {
        background-color: #ffff99;
    }

    .dataframe {
        background-color: white;
        color: black;
        font-size: 16px;
    }
</style>

Key libraries are imported, and the watermark is used to check versions. Random seeds are also set.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import random

import scipy
import numpy as np

import matplotlib
import matplotlib.pyplot as plt

seed = 123
random_state = 123

random.seed(seed)
np.random.seed(seed)

from watermark import watermark

print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))

Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.17.2

scipy     : 1.11.2
matplotlib: 3.8.1
numpy     : 1.25.2

Watermark: 2.4.3

Overview

GNN (Graph Neural Network) is a neural network designed to handle graph-structured data. GCN (Graph Convolutional Network), a type of GNN, uses graph convolution operations. By learning node connections and features, GNNs are applicable to tasks like node classification, link prediction, and graph classification. These methods are used in recommender systems, social network analysis, and other large-scale datasets.

This article covers GNN basics, GCN mechanisms, and implementation examples as a personal reference. Errors may exist, so please bear with me.

(Continue with the same structure, ensuring accurate translation and adherence to the specified formatting rules.)

1. Basics of GNN

1-1. What is GNN?

GNN refers to neural networks that process graph-structured data. A graph consists of vertices (nodes) and edges, representing discrete structures. Unlike images or text, graphs explicitly handle connections between nodes, often with feature vectors assigned to each node.

In tasks like link prediction, relationships between nodes are critical. GNNs aggregate information from neighboring nodes to learn node features. A representative approach is the Message Passing mechanism, where each node receives and updates information from adjacent nodes. By stacking layers, information from more distant nodes propagates through the network.

GNNs are applied to tasks such as node classification, graph classification, and link prediction.

1-2. General Structure of GNN

The general framework of GNN is as follows:

Initialize node features as $ \mathbf{h}_v^{(0)} $.
Aggregate messages from neighboring nodes at each layer.
Update node embeddings using the aggregated messages.
Use the final layer’s embeddings for specific tasks.

Node information exchange is the key to GNNs. Given a node $ v $ and its neighbors $ \mathcal{N}(v) $, the update formula is typically expressed as:

$$ \mathbf{h}_v^{(l+1)} = \phi\bigl(\mathbf{h}_v^{(l)}, \text{Aggregate}(\mathbf{h}_u^{(l)} : u \in \mathcal{N}(v))\bigr) $$

The Aggregate function can vary, such as weighted averages or convolutions, changing the model’s characteristics. Parameters are optimized via backpropagation.

1-3. Applications of GNN

Social Network Analysis: Analyze user connections, detect communities, or identify influencers.
Recommender Systems: Represent users and items as a bipartite graph, improving recommendation accuracy.
Molecular Analysis: Model molecules as graphs of atoms and bonds, aiding drug discovery.
Logistics Optimization: Solve routing problems in supply chain networks.

Learning node centrality or community structures enhances analytical precision. GNNs also help visualize organizational communication in businesses.

1-4. Advantages and Challenges of GNN

Advantages:

Directly handle graph structures, considering rich node relationships.
Efficient computation for sparse graphs through optimized operations.

Challenges:

Computationally intensive for large and dense graphs, requiring significant memory for adjacency matrices.
Over-smoothing: In deep GNNs, node embeddings may become indistinguishable, degrading performance.
Hyperparameter tuning and optimization require careful attention.

2. Basics of GCN

2-1. What is GCN?

GCN extends convolutional neural networks (CNN) to graph data. Given a feature matrix $ \mathbf{X} $ and adjacency matrix $ \mathbf{A} $, GCN uses normalized adjacency matrices for convolution. A typical GCN layer updates embeddings as:

$$ \mathbf{H}^{(l+1)} = \sigma\bigl(\hat{\mathbf{D}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{\mathbf{D}}^{-\frac{1}{2}} \mathbf{H}^{(l)} \mathbf{W}^{(l)}\bigr) $$

Here:

$ \hat{\mathbf{A}} = \mathbf{A} + \mathbf{I} $ (adjacency matrix with self-loops),
$ \hat{\mathbf{D}} $ is the diagonal degree matrix of $ \hat{\mathbf{A}} $,
$ \mathbf{W}^{(l)} $ is the weight matrix for layer $ l $, and
$ \sigma $ is an activation function (e.g., ReLU).

GCN gathers neighboring node information to update embeddings, handling local structures in the graph.

2-2. Advantages of GCN

Naturally handles graph data and has proven applications in node and graph classification.
Simple convolution operations make GCN relatively easy to implement.
Scalable for moderately sized graphs with proper optimizations.
Provides a framework for representing discrete graph structures as continuous vectors.

2-3. Challenges of GCN

Computational cost increases with graph size, requiring significant memory for full-batch adjacency matrix processing.
Over-smoothing risks in deeper architectures.
Limitations in handling non-uniform node features or sparse connected components.

Alternative methods like GraphSAGE and GAT (Graph Attention Network) address these limitations by introducing sampling and attention mechanisms.

3. Implementation Example of GCN (Python)

Below is a simplified GCN implementation for clarity.

import torch
import torch.nn as nn
import torch.nn.functional as F

class GCNLayer(nn.Module):
    def __init__(self, in_dim: int, out_dim: int):
        super().__init__()
        self.W_mat = nn.Parameter(torch.FloatTensor(in_dim, out_dim))
        nn.init.xavier_uniform_(self.W_mat.data)

    def forward(self, X_tensor, adj_mat):
        I_tensor = torch.eye(adj_mat.size(0))
        A_hat = adj_mat + I_tensor
        D_hat = torch.diag(torch.sum(A_hat, dim=1))
        D_hat_inv_sqrt = torch.sqrt(torch.inverse(D_hat))
        A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt

        H_tensor = A_norm @ X_tensor @ self.W_mat
        return H_tensor

class GCN(nn.Module):
    def __init__(self, in_dim: int, hidden_dim: int, out_dim: int):
        super().__init__()
        self.layer1 = GCNLayer(in_dim, hidden_dim)
        self.layer2 = GCNLayer(hidden_dim, out_dim)

    def forward(self, X_tensor, adj_mat):
        H_tensor = self.layer1(X_tensor, adj_mat)
        H_tensor = F.relu(H_tensor)
        H_tensor = self.layer2(H_tensor, adj_mat)
        return H_tensor

# Example feature matrix (X_tensor) and adjacency matrix (adj_mat)
X_list = [[1.0, 0.5], [0.3, 0.8], [0.9, 0.1], [0.2, 0.6]]
adj_list = [[0, 1, 0, 0], [1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0]]
X_tensor = torch.tensor(X_list, dtype=torch.float)
adj_mat = torch.tensor(adj_list, dtype=torch.float)

model = GCN(in_dim=2, hidden_dim=4, out_dim=2)
output_tensor = model(X_tensor, adj_mat)
print(output_tensor)

Output:

tensor([[ 0.3374, -0.1291],
        [ 0.3729, -0.1495],
        [ 0.3290, -0.1271],
        [ 0.2402, -0.0990]], grad_fn=<MmBackward0>)

This example demonstrates a simple implementation. Real-world applications require additional considerations such as mini-batches, GPU compatibility, and loss functions. Libraries like PyTorch Geometric or DGL are commonly used for such tasks.

4. Simple GNN Implementation (Python)

Below is a simplified implementation of a GNN using the concept of message passing. The process involves aggregating information from neighboring nodes and updating node embeddings.

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleGNNLayer(nn.Module):
    def __init__(self, in_dim: int, out_dim: int):
        super().__init__()
        self.W_mat = nn.Linear(in_dim, out_dim)

    def forward(self, X_tensor, adj_mat):
        # Aggregate features from neighbors
        agg_tensor = torch.matmul(adj_mat, X_tensor)
        # Apply weighted linear transformation
        updated_tensor = self.W_mat(agg_tensor)
        return updated_tensor

class SimpleGNN(nn.Module):
    def __init__(self, in_dim: int, hidden_dim: int, out_dim: int):
        super().__init__()
        self.layer1 = SimpleGNNLayer(in_dim, hidden_dim)
        self.layer2 = SimpleGNNLayer(hidden_dim, out_dim)

    def forward(self, X_tensor, adj_mat):
        H_tensor = self.layer1(X_tensor, adj_mat)
        H_tensor = F.relu(H_tensor)
        H_tensor = self.layer2(H_tensor, adj_mat)
        return H_tensor

# Example feature matrix (X_tensor) and adjacency matrix (adj_mat)
X_list = [[1.0, 0.5], [0.3, 0.8], [0.9, 0.1], [0.2, 0.6]]
adj_list = [[0, 1, 0, 0], [1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0]]
X_tensor = torch.tensor(X_list, dtype=torch.float)
adj_mat = torch.tensor(adj_list, dtype=torch.float)

model_gnn = SimpleGNN(in_dim=2, hidden_dim=4, out_dim=2)
output_gnn_tensor = model_gnn(X_tensor, adj_mat)
print(output_gnn_tensor)

Output:

tensor([[-0.0374,  0.0476],
        [-0.4478,  0.1145],
        [-0.1853, -0.0557],
        [-0.2854,  0.0934]], grad_fn=<AddmmBackward0>)

This implementation differs from GCN by not applying normalization to the adjacency matrix. Additionally, GCN integrates convolution concepts more explicitly into its operations. This SimpleGNN aggregates neighbor information by summation, which can affect learning stability and performance in practice.

5. Mathematical Example and Domains

Consider a graph $ G = (V, E) $, where $ V $ is the set of nodes and $ E $ is the set of edges. Each node $ v \in V $ has a feature vector $ \mathbf{x}_v \in \mathbb{R}^d $. Mapping $ v \mapsto \mathbf{x}_v $ represents node features.

In message passing, features from neighbors $ \mathcal{N}(v) $ are aggregated, typically defined as:

$$ \mathbf{m}_v^{(l)} = \text{AGG}\bigl({\mathbf{h}_u^{(l)} : u \in \mathcal{N}(v)}\bigr) $$

The node embedding is then updated as:

$$ \mathbf{h}_v^{(l+1)} = \text{UPDATE}\bigl(\mathbf{h}_v^{(l)}, \mathbf{m}_v^{(l)}\bigr) $$

The domain of these functions is:

$ \text{AGG}: (\mathbb{R}^d)^{|\mathcal{N}(v)|} \rightarrow \mathbb{R}^d $
$ \text{UPDATE}: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^d $

In GCN, aggregation is typically matrix-based, and updates involve linear transformations and nonlinear activations. A clear understanding of these operations deepens comprehension of GNN behavior.

6. Real-World Applications

6-1. Recommender Systems

In recommender systems, user and item nodes are represented as a bipartite graph, where edges indicate interactions (e.g., purchases or ratings). Using GNNs improves recommendation accuracy by learning user preferences. For instance:

Nodes: Users and items.
Edges: Interactions like purchases or reviews.

Attributes such as price range (for items) or demographics (for users) are embedded into node features. GNNs generate node embeddings, and items near the user embedding in the feature space are recommended.

GNNs analyze user connections to identify communities or influencers. Features include:

Node features: User profiles or behavior.
Edge features: Interactions like likes or comments.

Techniques like GAT (Graph Attention Network) focus on relevant neighbors, and GNNs can detect fraudulent accounts or predict trends.

6-3. Molecular Analysis

In molecular analysis, atoms are nodes, and chemical bonds are edges. GNNs predict properties or reactivity of molecules. Attributes include:

Node features: Atom type.
Edge features: Bond type.

This approach aids drug discovery by reducing simulation costs and accelerating the identification of promising compounds.

6-4. Other Applications

Supply Chain Management: Analyze transaction and logistics networks.
Fraud Detection: Identify risks in financial transaction graphs.
IoT Networks: Optimize sensor placement and fault detection.

The versatility of GNNs makes them applicable across diverse industries.

7. Computational Example: Small Graph with GCN

Consider a small graph with 3 nodes, a feature dimension of 2, and the following adjacency and feature matrices:

$$ \mathbf{X} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}, \quad \mathbf{A} = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix} $$

Step 1: Compute $\tilde{\mathbf{A}}$

Add self-loops to the adjacency matrix: $$ \tilde{\mathbf{A}} = \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 1 \\ 0 & 1 & 1 \end{pmatrix} $$

Step 2: Compute $\tilde{\mathbf{D}}$

Compute the degree matrix: $$ \tilde{\mathbf{D}} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 2 \end{pmatrix} $$

Step 3: Normalize Adjacency Matrix

Compute $ \tilde{\mathbf{D}}^{-\frac{1}{2}} $: $$ \tilde{\mathbf{D}}^{-\frac{1}{2}} = \begin{pmatrix} \frac{1}{\sqrt{2}} & 0 & 0 \\ 0 & \frac{1}{\sqrt{3}} & 0 \\ 0 & 0 & \frac{1}{\sqrt{2}} \end{pmatrix} $$

Normalized adjacency matrix: $$ \hat{\mathbf{A}} = \tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}} $$

Step 4: Apply GCN Layer

Given weights $ \mathbf{W}^{(0)} $: $$ \mathbf{W}^{(0)} = \begin{pmatrix} 1 & -1 \\ 2 & 1 \end{pmatrix} $$

The output of the first layer is: $$ \mathbf{H}^{(1)} = \sigma(\hat{\mathbf{A}} \mathbf{X} \mathbf{W}^{(0)}) $$

This demonstrates how GCN processes graphs mathematically. For deeper understanding, manually compute small examples.

8. Conclusion

This article covered the basics of GNNs and GCNs, their implementation, and real-world applications. GNNs are powerful frameworks for processing graph data, and GCNs are among the most prominent methods. They are widely used in recommender systems, social network analysis, molecular analysis, and beyond.

While challenges like computational costs and over-smoothing exist, research continues to improve their capabilities. Starting with GCN as a baseline and exploring advanced methods is recommended for those new to GNNs.