Symbolic Probability and Statistics with SymPy

Overview

In this article, I will explain how to reproduce the derivation processes of major theorems and estimation methods in probability and statistics on a computer using SymPy, Python’s symbolic mathematics library.

Specifically, I will cover the following two topics:

  1. Convergence from Binomial Distribution to Poisson Distribution: I will demonstrate the process of deriving the probability mass function of the Poisson distribution by performing limit operations on a discrete distribution using symbolic computation.
  2. Maximum Likelihood Estimation and Fisher Information of Normal Distribution: For maximum likelihood estimation on a continuous distribution, I will perform partial differentiation of the log-likelihood function, extremum testing using the Hessian matrix, and derivation of the Fisher information matrix.

Author’s Environment

The code in this article has been confirmed to work in the following environment.

!sw_vers
!python -V
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74
Python 3.12.12

Importing Libraries and Settings

Import the necessary libraries. To make SymPy’s mathematical output easier to check, enable display in LaTeX format.

import sympy
from sympy import (
    symbols, Symbol, S, oo, Sum, limit, exp, log, sqrt, pi,
    simplify, binomial, factorial, diff, solve, Eq, Matrix,
    hessian, Function, init_printing, latex, gamma
)
from sympy.stats import Normal, Binomial, density, E, variance
import matplotlib.pyplot as plt
import numpy as np
import math
from IPython.display import display # For math display

# Settings to display math formulas in LaTeX format
init_printing()

# Graph style settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['font.family'] = 'DejaVu Sans'

print("sympy version :", sympy.__version__)
sympy version : 1.14.0

1. Limit from Binomial Distribution to Poisson Distribution

The Poisson distribution is defined as the limit of a binomial distribution where the number of trials $n$ goes to infinity and the success probability $p$ becomes extremely small. Here, I will use SymPy to execute this limit operation and derive the probability mass function (PMF) of the Poisson distribution.

1.1 Definition

Let random variable $X$ follow a binomial distribution with number of trials $n$ and success probability $p$. $$ X \sim B(n, p) $$

At this time, the PMF is given by: $$ P(X=k) = \binom{n}{k} p^k (1-p)^{n-k} $$

Here, we take the limit under the following conditions:

  1. $n \to \infty$
  2. $p \to 0$
  3. $np = \lambda$ (constant)

That is, let $p = \frac{\lambda}{n}$.

# Definition of symbols
n = symbols('n', integer=True, positive=True)
k = symbols('k', integer=True, nonnegative=True)
p = symbols('p', real=True, positive=True)
lam = symbols('lambda', real=True, positive=True)

# PMF of Binomial Distribution
pmf_binom = binomial(n, k) * p**k * (1 - p)**(n - k)

print("Probability Mass Function (PMF) of Binomial Distribution:")
display(pmf_binom)
Probability Mass Function (PMF) of Binomial Distribution:

$\displaystyle p^{k} \left(1 - p\right)^{- k + n} {\binom{n}{k}}$

1.2 Decomposition of Terms

Substitute probability $p$ with $\lambda / n$ and find the limit as $n \to \infty$. With a simple application of the limit function, SymPy’s internal algorithm may not be able to fully identify the properties of symbol $k$, and the calculation may not complete (NotImplementedError). Therefore, I take the approach of decomposing the formula into terms with different convergence properties and analyzing them individually.

$$ P(X=k) = \underbrace{\frac{n(n-1)\cdots(n-k+1)}{n^k}}_{(A)} \cdot \underbrace{\frac{\lambda^k}{k!}}_{(B)} \cdot \underbrace{\left(1 - \frac{\lambda}{n}\right)^n}_{(C)} \cdot \underbrace{\left(1 - \frac{\lambda}{n}\right)^{-k}}_{(D)} $$

  • (A): The product of $k$ linear expressions of $n$ divided by $n^k$. The limit value should be 1.
  • (B): A constant term independent of $n$.
  • (C): A term related to the definition of Napier’s constant $e$. The limit value is $e^{-\lambda}$.
  • (D): The limit value is 1.
# Substitution p -> lambda/n
pmf_sub = pmf_binom.subs(p, lam / n)

# Decomposition of terms
# Term A: nCk * k! / n^k
# binomial(n, k) = gamma(n+1) / (gamma(k+1) * gamma(n-k+1))
term_A = (gamma(n + 1) / gamma(n - k + 1)) / n**k

# Term B: lambda^k / k!
term_B = lam**k / factorial(k)

# Term C: (1 - lambda/n)^n
term_C = (1 - lam / n)**n

# Term D: (1 - lambda/n)^(-k)
term_D = (1 - lam / n)**(-k)

print("Decomposed Terms:")
print("Term A:")
display(term_A)
print("Term C:")
display(term_C)
Decomposed Terms:
Term A:

$\displaystyle \frac{n^{- k} \Gamma\left(n + 1\right)}{\Gamma\left(- k + n + 1\right)}$

Term C:

$\displaystyle \left(- \frac{\lambda}{n} + 1\right)^{n}$

1.3 Limit Calculation

Find the limit of each term. About Term (A): Since calculating symbolically for a general $k$ may result in an error, here I substitute finite integers $k=0, 1, 2, 5$ and confirm that the limit value is 1 in all cases (inductive confirmation).

# Limit confirmation of Term A (confirm with concrete values for k)
print("Limit value confirmation of Term A:")
for k_val in [0, 1, 2, 5]:
    term_A_concrete = term_A.subs(k, k_val)
    lim_val = limit(term_A_concrete, n, oo)
    print(f"k={k_val}: limit = {lim_val}")

# Therefore, assume Term A -> 1 for general finite integer k
lim_A = 1
Limit value confirmation of Term A:
k=0: limit = 1
k=1: limit = 1
k=2: limit = 1
k=5: limit = 1

About Terms (C) and (D): These are found as standard limit calculations.

# Limit of Term C, D
lim_C = limit(term_C, n, oo)
lim_D = limit(term_D, n, oo)

print("Limit of Term C:", lim_C)
print("Limit of Term D:", lim_D)
Limit of Term C: exp(-lambda)
Limit of Term D: 1

1.4 Combining Results

Combine the above results and confirm that the formula for the Poisson distribution is reconstructed.

$$ 1 \cdot \frac{\lambda^k}{k!} \cdot e^{-\lambda} \cdot 1 = \frac{\lambda^k e^{-\lambda}}{k!} $$

# Overall calculation result
poisson_pmf_derived = lim_A * term_B * lim_C * lim_D

print("Derived Poisson Distribution:")
display(poisson_pmf_derived)

# Comparison with theoretical formula
expected_poisson_pmf = (lam**k * exp(-lam)) / factorial(k)
print("Mathematical match confirmation:", simplify(poisson_pmf_derived - expected_poisson_pmf) == 0)
Derived Poisson Distribution:

$\displaystyle \frac{\lambda^{k} e^{- \lambda}}{k!}$

Mathematical match confirmation: True

1.5 Visualization of Convergence

Visualize the change in distribution shape as the value of $n$ increases. The parameter is set to $\lambda=5$.

def plot_convergence(lam_val=5, n_list=[10, 20, 50, 100]):
    k_vals = np.arange(0, 20)

    plt.figure(figsize=(10, 6))

    # Poisson Distribution (Theoretical Value)
    # np.math is deprecated, so use math
    poisson_probs = [np.exp(-lam_val) * (lam_val**k_i) / math.factorial(k_i) for k_i in k_vals]
    plt.plot(k_vals, poisson_probs, 'k--', linewidth=2, label=rf'Poisson($\lambda={lam_val}$)')

    # Binomial Distribution
    markers = ['o', 's', '^', 'D']
    for i, n_val in enumerate(n_list):
        p_val = lam_val / n_val
        binom_probs = []
        for k_i in k_vals:
            if k_i > n_val:
                prob = 0
            else:
                coef = math.comb(n_val, k_i)
                prob = coef * (p_val**k_i) * ((1 - p_val)**(n_val - k_i))
            binom_probs.append(prob)

        plt.plot(k_vals, binom_probs, marker=markers[i % len(markers)], linestyle='-', alpha=0.7, label=f'Binomial($n={n_val}, p={p_val:.3g}$)')

    plt.title(rf'Convergence of Binomial to Poisson ($\lambda={lam_val}$)', fontsize=16)
    plt.xlabel('k', fontsize=12)
    plt.ylabel('P(X=k)', fontsize=12)
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.show()

plot_convergence(lam_val=5)

It can be confirmed that as $n$ increases, the binomial distribution converges to the Poisson distribution.


2. Maximum Likelihood Estimation and Fisher Information of Normal Distribution

Next, I will perform maximum likelihood estimation for the normal distribution. I will find the estimator (MLE) from the maximization condition of the log-likelihood function, then verify its maximality using the Hessian matrix. I will also calculate the Fisher information matrix.

2.1 Setting the Likelihood Function

Assume that $n$ independent data points $\mathcal{D} = {x_1, \dots, x_n}$ follow a normal distribution $\mathcal{N}(\mu, \sigma^2)$. The probability density function (PDF) is as follows:

$$ f(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right) $$

# Definition of symbols
mu = symbols('mu', real=True)
sigma = symbols('sigma', real=True, positive=True)
x = symbols('x', real=True)
n_samples = symbols('n', integer=True, positive=True)

# PDF of Normal Distribution
pdf_normal = 1 / sqrt(2 * pi * sigma**2) * exp(-(x - mu)**2 / (2 * sigma**2))

print("PDF of Normal Distribution:")
display(pdf_normal)
PDF of Normal Distribution:

$\displaystyle \frac{\sqrt{2} e^{- \frac{\left(- \mu + x\right)^{2}}{2 \sigma^{2}}}}{2 \sqrt{\pi} \sigma}$

2.2 Defining the Log-Likelihood Function

Define the log-likelihood function $\ell(\mu, \sigma) = \sum \ln f(x_i)$. For convenience of calculation, express the summation terms using $s_1 = \sum x_i$ and $s_2 = \sum x_i^2$.

$$ \ell(\mu, \sigma) = -\frac{n}{2}\ln(2\pi) - n\ln(\sigma) - \frac{1}{2\sigma^2} (s_2 - 2\mu s_1 + n\mu^2) $$

# Constant term and logarithmic term
term1 = -n_samples / 2 * log(2 * pi)
term2 = -n_samples * log(sigma)

# Expansion term of sum of squares
s_1 = symbols('s_1', real=True) # sum(x_i)
s_2 = symbols('s_2', real=True) # sum(x_i^2)
sum_sq_expanded = s_2 - 2*mu*s_1 + n_samples * mu**2

# Log-likelihood function
log_L_calc = term1 + term2 - 1/(2*sigma**2) * sum_sq_expanded

print("Log-likelihood function:")
display(log_L_calc)
Log-likelihood function:

$\displaystyle - n \log{\left(\sigma \right)} - \frac{n \log{\left(2 \pi \right)}}{2} - \frac{\mu^{2} n - 2 \mu s_{1} + s_{2}}{2 \sigma^{2}}$

2.3 Derivation of Maximum Likelihood Estimators

We find the MLE by partially differentiating with respect to parameters $\mu, \sigma$ and setting equal to 0.

Estimation of $\mu$

$$ \frac{\partial \ell}{\partial \mu} = 0 $$

# Partial differentiation with respect to mu
d_logL_d_mu = diff(log_L_calc, mu)

print("Score function (mu):")
display(simplify(d_logL_d_mu))

# Solution of the equation
mu_solution = solve(Eq(d_logL_d_mu, 0), mu)

print("Maximum likelihood estimator of mu:")
display(mu_solution[0])
Score function (mu):

$\displaystyle \frac{- \mu n + s_{1}}{\sigma^{2}}$

Maximum likelihood estimator of mu:

$\displaystyle \frac{s_{1}}{n}$

The result is the sample mean $\bar{x} = s_1/n$.

Estimation of $\sigma$

$$ \frac{\partial \ell}{\partial \sigma} = 0 $$

# Partial differentiation with respect to sigma
d_logL_d_sigma = diff(log_L_calc, sigma)

print("Score function (sigma):")
display(simplify(d_logL_d_sigma))

# Substitute the solution for mu and solve
d_logL_d_sigma_subs = d_logL_d_sigma.subs(mu, mu_solution[0])
sigma_solutions = solve(Eq(d_logL_d_sigma_subs, 0), sigma)

# Select positive solution
sigma_mle = [s for s in sigma_solutions if not s.is_negative][0]

print("Maximum likelihood estimator of sigma:")
display(sigma_mle)
Score function (sigma):

$\displaystyle \frac{\mu^{2} n - 2 \mu s_{1} - n \sigma^{2} + s_{2}}{\sigma^{3}}$

Maximum likelihood estimator of sigma:

$\displaystyle \frac{\sqrt{n s_{2} - s_{1}^{2}}}{n}$

The result is the sample standard deviation.

2.4 Verification with Hessian Matrix

Calculate the Hessian matrix $H$ and confirm its negative definiteness. $$ H = \begin{pmatrix} \frac{\partial^2 \ell}{\partial \mu^2} & \frac{\partial^2 \ell}{\partial \mu \partial \sigma} \\ \frac{\partial^2 \ell}{\partial \sigma \partial \mu} & \frac{\partial^2 \ell}{\partial \sigma^2} \end{pmatrix} $$

# Calculation of Hessian Matrix
vars_list = [mu, sigma]
hessian_mat = hessian(log_L_calc, vars_list)

print("Hessian Matrix:")
display(hessian_mat)

# Evaluation at estimated values (1,1) component
H_at_mle = hessian_mat.subs({
    mu: mu_solution[0],
    sigma: sigma_mle
})

h11_val = simplify(H_at_mle[0, 0])
print("Value of (1,1) component:")
display(h11_val)
Hessian Matrix:

$\displaystyle \left[\begin{matrix}- \frac{n}{\sigma^{2}} & \frac{2 \mu n - 2 s_{1}}{\sigma^{3}}\\\frac{2 \mu n - 2 s_{1}}{\sigma^{3}} & \frac{n}{\sigma^{2}} - \frac{3 \left(\mu^{2} n - 2 \mu s_{1} + s_{2}\right)}{\sigma^{4}}\end{matrix}\right]$

Value of (1,1) component:

$\displaystyle - \frac{n^{3}}{n s_{2} - s_{1}^{2}}$

The $(1,1)$ component is $-n/\sigma^2$ (negative). The determinant also becomes positive, so this point is a local maximum.

2.5 Fisher Information Matrix

Calculate the Fisher information matrix $I(\theta) = -E[H(\theta)]$.

# Expectation calculation
# E[x_i] = mu, E[(x_i-mu)^2] = sigma^2
i11 = -hessian_mat[0, 0]
i12 = 0
term_sum_sq = n_samples * sigma**2
val_h22 = hessian_mat[1, 1].subs(sum_sq_expanded, term_sum_sq)
i22 = -(-2 * n_samples / sigma**2)

Fisher_Info = Matrix([
    [i11, i12],
    [i12, i22]
])

print("Fisher Information Matrix:")
display(Fisher_Info)
Fisher Information Matrix:

$\displaystyle \left[\begin{matrix}\frac{n}{\sigma^{2}} & 0\\0 & \frac{2 n}{\sigma^{2}}\end{matrix}\right]$

A diagonal matrix is obtained, confirming the orthogonality between parameters.

2.6 Contour Plot of Log-Likelihood

Visualize the shape of the log-likelihood function.

def plot_log_likelihood(n=100, true_mu=5.0, true_sigma=2.0):
    np.random.seed(42)
    data = np.random.normal(true_mu, true_sigma, n)

    sample_mean = np.mean(data)
    sample_std = np.std(data)

    mu_range = np.linspace(true_mu - 1.0, true_mu + 1.0, 100)
    sigma_range = np.linspace(true_sigma - 1.0, true_sigma + 1.0, 100)
    MU, SIGMA = np.meshgrid(mu_range, sigma_range)

    term1 = -n/2 * np.log(2 * np.pi)
    term2 = -n * np.log(SIGMA)
    SSD = np.sum((data[:, np.newaxis, np.newaxis] - MU)**2, axis=0)
    LOG_L = term1 + term2 - SSD / (2 * SIGMA**2)

    plt.figure(figsize=(10, 8))
    contours = plt.contour(MU, SIGMA, LOG_L, levels=20, cmap='viridis')
    plt.clabel(contours, inline=True, fontsize=8)

    plt.plot(sample_mean, sample_std, 'rx', markersize=12, markeredgewidth=2, label='MLE')
    plt.plot(true_mu, true_sigma, 'wo', markerfacecolor='none', markeredgewidth=2, label='True Param')

    plt.title(rf'Log-Likelihood Contour (n={n})', fontsize=16)
    plt.xlabel(rf'$\mu$', fontsize=14)
    plt.ylabel(rf'$\sigma$', fontsize=14)
    plt.legend()
    plt.colorbar(label='Log-Likelihood')
    plt.show()

plot_log_likelihood()

Conclusion

In this article, I performed mathematical analysis of statistical models using SymPy.

  1. Symbolization of Limit Operations: I derived the convergence from binomial distribution to Poisson distribution using a safe method employing term decomposition and numerical confirmation, demonstrating SymPy’s computational capabilities.
  2. Automation of Maximum Likelihood Estimation: Using the normal distribution as an example, I consistently performed operations from differentiation of the likelihood function to derivation of the Hessian matrix and Fisher information.