Properties of Doubly Stochastic Matrices

Overview

This article organizes the properties of a doubly stochastic matrix.

A doubly stochastic matrix is a nonnegative matrix whose rows and columns are each normalized to sum to 1.

In machine learning or recommender systems, such probabilistic normalization is widely used for tasks involving optimal assignment problems or distribution estimation, among others.

In this article, I present the definition, properties, applications, calculation examples, and a simple Python implementation of doubly stochastic matrices.

Source Code

github

  • The Jupyter Notebook file is available here .

google colaboratory

  • If you want to run it on Google Colaboratory, please see here .

Execution Environment

The operating system used here is macOS. Note that the command-line options may differ from those of Linux or Unix.

!sw_vers
ProductName:		macOS
ProductVersion:		13.5.1
BuildVersion:		22G90
!python -V
Python 3.9.17

We import some basic libraries, use watermark to check the versions, and set the random seed:

%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import random

import scipy
import numpy as np

import matplotlib
import matplotlib.pyplot as plt

seed = 123
random_state = 123

random.seed(seed)
np.random.seed(seed)


from watermark import watermark

print(watermark(python=True, watermark=True, iversions=True, globals_=globals()))
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.17.2

matplotlib: 3.8.1
numpy     : 1.25.2
scipy     : 1.11.2

Watermark: 2.4.3

Definition of a Doubly Stochastic Matrix

A doubly stochastic matrix is a nonnegative square matrix whose row sums and column sums each equal 1. Consider an $n \times n$ matrix $\mathbf{M}$, where each element satisfies $\mathbf{M}_{ij} \geq 0$. For every row $i$,

$$ \sum_{j=1}^{n} \mathbf{M}_{ij} = 1 $$

and for every column $j$,

$$ \sum_{i=1}^{n} \mathbf{M}_{ij} = 1. $$

Such an $\mathbf{M}$ is called a doubly stochastic matrix. As a matrix whose entries can be interpreted as probabilities, it has a structure that maintains consistency for probability distributions in both directions (rows and columns).

In addition, a doubly stochastic matrix can define a mapping $f: \mathbb{R}^n \to \mathbb{R}^n,\ f(\mathbf{x}) = \mathbf{M}\mathbf{x}.$ Here, $\mathbf{x}$ is an $n$-dimensional vector, and by using the doubly stochastic matrix $\mathbf{M}$, we can map $\mathbf{x}$ to another vector $\mathbf{y} = \mathbf{M}\mathbf{x}$, taking probabilistic weighting into account.

This mapping can be regarded as a transformation on a probability space.


Basic Properties of a Doubly Stochastic Matrix

Nonnegative Matrix and Probabilistic Properties

Because a doubly stochastic matrix is nonnegative and each of its row sums and column sums equals 1, it can be viewed as a matrix that transforms probability distributions. This guarantees that if you input a probability vector, you will get a probability vector as the output.

Birkhoff–von Neumann Theorem

According to the Birkhoff–von Neumann theorem, any doubly stochastic matrix can be expressed as a convex combination of a finite number of permutation matrices. A permutation matrix $\mathbf{P}$ has exactly one entry of 1 in each row and in each column, with all other entries being 0. A doubly stochastic matrix $\mathbf{M}$ can be written as

$$ \mathbf{M} = \sum_{k} \lambda_k \mathbf{P}_k,\quad \sum_{k} \lambda_k = 1,\ \lambda_k \ge 0. $$

Thus, $\mathbf{M}$ lies in the interior of the convex hull of these permutation matrices, often called the Birkhoff polytope.

Permutations and Permutation Matrices

Permutation

  • A permutation $\sigma$ is a mapping that rearranges the set ${1,2,\dots,n}$. For example, $\sigma(1)=2,\ \sigma(2)=1,\ \sigma(3)=3,$ and so on.

Permutation Matrices

A permutation matrix is a square matrix in which each row and column contains exactly one entry of 1, with all others being 0.

  • A permutation matrix $P$ is defined as follows:

$$ p_{ij} = \begin{cases} 1 & \text{if } j = \sigma(i)\\ 0 & \text{otherwise} \end{cases} $$

  • For instance,

$$ P = \begin{pmatrix} 0 & 1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 1 \end{pmatrix}. $$

In this matrix:

  • Row 1 has a 1 in column 2 (i.e., $\sigma(1)=2$)
  • Row 2 has a 1 in column 1 (i.e., $\sigma(2)=1$)
  • Row 3 has a 1 in column 3 (i.e., $\sigma(3)=3$)

Preservation of the L1 Norm

Let $\mathbf{M}$ be a doubly stochastic matrix. For any probability vector $\mathbf{x}$, the vector $\mathbf{y} = \mathbf{M}\mathbf{x}$ is also a probability vector.

Hence, $\sum_i y_i = \sum_i x_i = 1.$

This means that $\mathbf{M}$ acts as a shape-preserving transformation in the space of probability vectors.

In particular, $|\mathbf{x}|_1 = |\mathbf{M}\mathbf{x}|_1.$ Therefore, $\mathbf{M}$ is a linear transformation that preserves the $L_1$ norm.

Considerations on the Frobenius Norm

A doubly stochastic matrix has nonnegative entries, with row and column sums equal to 1. For instance, if $n=3$, one might consider a doubly stochastic matrix $\mathbf{M}$ and calculate its Frobenius norm $|\mathbf{M}|_F = \sqrt{\sum_{i,j} \mathbf{M}_{ij}^2}$. The exact value depends on its entries, which can vary subject to these constraints.

Because the set of doubly stochastic matrices is convex, there can be many such matrices within this set. By the Birkhoff–von Neumann theorem, any doubly stochastic matrix is a convex combination of permutation matrices. Hence its Frobenius norm will lie within the range defined by such combinations.

Permutation matrices each have $n$ entries of 1 and 0 elsewhere, so $|\mathbf{P}|_F = \sqrt{n}$. Thus, the Frobenius norm of a doubly stochastic matrix tends to be around $\sqrt{n}$, with smoothing effects from convex combinations.


Concrete Calculation Example

Here is a simple example. Consider a doubly stochastic matrix $\mathbf{M}$ of size $n=3$. For example,

$$ \mathbf{M}=\left(\begin{array}{ccc} 0.5 & 0.3 & 0.2 \\ 0.2 & 0.4 & 0.4 \\ 0.3 & 0.3 & 0.4 \end{array}\right) $$

Check that each row and column sums to 1:

  • Row 1 sum: $0.5 + 0.3 + 0.2 = 1.0$

  • Row 2 sum: $0.2 + 0.4 + 0.4 = 1.0$

  • Row 3 sum: $0.3 + 0.3 + 0.4 = 1.0$

  • Column 1 sum: $0.5 + 0.2 + 0.3 = 1.0$

  • Column 2 sum: $0.3 + 0.4 + 0.3 = 1.0$

  • Column 3 sum: $0.2 + 0.4 + 0.4 = 1.0$

Therefore, $\mathbf{M}$ is a doubly stochastic matrix. When multiplying by a probability vector $\mathbf{x} = (0.2, 0.5, 0.3)^T$,

$$ \mathbf{y}=\mathbf{M x}=\left(\begin{array}{ccc} 0.5 & 0.3 & 0.2 \\ 0.2 & 0.4 & 0.4 \\ 0.3 & 0.3 & 0.4 \end{array}\right)\left(\begin{array}{c} 0.2 \\ 0.5 \\ 0.3 \end{array}\right)=\left(\begin{array}{c} 0.5 * 0.2+0.3 * 0.5+0.2 * 0.3 \\ 0.2 * 0.2+0.4 * 0.5+0.4 * 0.3 \\ 0.3 * 0.2+0.3 * 0.5+0.4 * 0.3 \end{array}\right)\\ =\left(\begin{array}{c} 0.1+0.15+0.06 \\ 0.04+0.2+0.12 \\ 0.06+0.15+0.12 \end{array}\right)=\left(\begin{array}{c} 0.31 \\ 0.36 \\ 0.33 \end{array}\right) $$

The sum of the elements of $\mathbf{y}$ is $0.31 + 0.36 + 0.33 = 1.0.$

Thus, if the input is a probability vector, the output is also a probability vector.


Simple Python Implementation Example

Below is a Python example that checks whether a matrix is doubly stochastic and computes its product with an arbitrary vector.

import numpy as np


# Determine whether a matrix is doubly stochastic
def is_double_stochastic(M_np):

    # Check for nonnegative entries
    if np.any(M_np < 0):
        return False

    # Check row sums
    row_sums_np = np.sum(M_np, axis=1)

    # Check column sums
    col_sums_np = np.sum(M_np, axis=0)

    # If all sums are 1, it is a doubly stochastic matrix
    if np.allclose(row_sums_np, 1.0) and np.allclose(col_sums_np, 1.0):
        return True

    return False


M_list = [[0.5, 0.3, 0.2], [0.2, 0.4, 0.4], [0.3, 0.3, 0.4]]
M_np = np.array(M_list)

# Check
print("Is double stochastic?", is_double_stochastic(M_np))

# Probability vector
x_list = [0.2, 0.5, 0.3]
x_np = np.array(x_list)

# Product
y_np = M_np.dot(x_np)
print("y        : ", y_np)
print("Sum of y : ", np.sum(y_np))
Is double stochastic? True
y        :  [0.31 0.36 0.33]
Sum of y :  1.0

Running this code confirms whether the matrix is doubly stochastic and, if we provide a probability vector as input, we obtain another probability vector. The output shows Is double stochastic? True and indicates that the sum of the elements of the output vector y is 1.0.

Algorithms for Obtaining a Doubly Stochastic Matrix

In general, the Sinkhorn-Knopp algorithm is used to project any nonnegative matrix onto a doubly stochastic matrix. It repeatedly normalizes row sums and column sums:

  • Normalize each row: $ \mathbf{A}’_{ij} = \displaystyle \frac{A_{ij}}{\sum_j A_{ij}}$
  • Normalize each column: $ \mathbf{A}’’_{ij} = \displaystyle \frac{\mathbf{A}’_{ij}}{\sum_i \mathbf{A}’_{ij}}$

By iterating these steps, if it converges, one obtains an approximation to a doubly stochastic matrix.

Matrix Decomposition and Spectral Properties

The eigenvalue distribution of a doubly stochastic matrix $\mathbf{M}$ often shows characteristic behavior under these constraints. Its largest eigenvalue is 1, associated with the eigenvector $(1,1,\dots,1)^T$. The other eigenvalues typically lie within the unit circle, a property often used in discussions of stability and convergence.

Conclusion

In this article, I surveyed the fundamental properties and practical examples of doubly stochastic matrices. Such matrices are nonnegative, with row and column sums each equal to 1, and by the Birkhoff–von Neumann theorem, they can be expressed as convex combinations of permutation matrices.

One can employ practical methods such as the Sinkhorn-Knopp algorithm to project any nonnegative matrix onto a doubly stochastic matrix, making these matrices quite useful in various applications.