Gumbel Distribution: Basics of Extreme Value Statistics and Implementation in Python

Introduction

In data analysis, there are cases where we are interested not in the “average behavior” but in “rare and extreme phenomena”. For example:

How much rainfall occurs in a once-in-a-century downpour?
What is the maximum load reached during peak access to a server?
What is the maximum loss (VaR) in financial markets?

Extreme Value Statistics deals with the statistical properties of such “maximum” or “minimum” values. This article explains the Gumbel Distribution, one of the most fundamental distributions in extreme value statistics.

In addition to the theoretical explanation of its definition and properties, we will cover implementation examples of parameter estimation and visualization using Python (SciPy), and also touch upon its application in the field of machine learning (Gumbel-Max Trick).

Source Code

GitHub

The Jupyter Notebook file is available here

Google Colaboratory

To run on Google Colaboratory, click here

Execution Environment

The OS is macOS. Note that the options may differ from Linux or Unix commands.

!sw_vers

ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74

!python -V

Python 3.14.0

Import basic libraries.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import random
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy.stats import gumbel_r

seed = 123
random.seed(seed)
np.random.seed(seed)

1. Extreme Value Statistics and Gumbel Distribution

Extreme Value Statistics is a field of study that analyzes only the maximum (or minimum) values of observed data.

Suppose there are independent random variables $X_1, X_2, \dots, X_n$ following a certain probability distribution. In this case, the distribution followed by the maximum value when the sample size $n$ is sufficiently large $$ M_n = \max(X_1, X_2, \dots, X_n) $$ is known to converge to one of three specific types of distributions (extreme value distributions), regardless of the shape of the original distribution (Fisher-Tippett-Gnedenko Theorem).

The three types are as follows:

Gumbel Distribution (Type I): Followed by the maximum of distributions with exponential tails (Normal distribution, Exponential distribution, etc.).
Fréchet Distribution (Type II): Followed by the maximum of distributions with power-law tails (Pareto distribution, etc.).
Weibull Distribution (Type III): Followed by the maximum of distributions with finite range (Uniform distribution, etc.).

The Gumbel distribution is suitable for modeling the maximum values of data obtained from many common distributions, such as the normal distribution and gamma distribution, and is the most widely used.

2. Definition and Properties of Gumbel Distribution

2.1 Definition

The Gumbel distribution has a location parameter $\mu \in \mathbb{R}$ and a scale parameter $\beta > 0$.

Cumulative Distribution Function (CDF) $$ F(x; \mu, \beta) = \exp\left(-\exp\left(-\frac{x-\mu}{\beta}\right)\right) $$

Probability Density Function (PDF) $$ f(x; \mu, \beta) = \frac{1}{\beta} \exp\left(-\frac{x-\mu}{\beta}\right) \exp\left(-\exp\left(-\frac{x-\mu}{\beta}\right)\right) $$

2.2 Representative Statistics

Mean: $\mu + \gamma\beta$ ($\gamma \approx 0.5772$ is the Euler-Mascheroni constant)
Variance: $\displaystyle\frac{\pi^2}{6}\beta^2$
Median: $\mu - \beta\ln(\ln 2)$
Mode: $\mu$

The distribution has a right-skewed shape (long tail on the right).

2.3 Visualization of Shape

Let’s check the shape of the Gumbel distribution when parameters are changed using Python.

x = np.linspace(-2, 10, 1000)

plt.figure(figsize=(10, 6))

# Plot with different parameters
params = [(0, 1), (1, 1.5), (2, 0.5)]
for mu, beta in params:
    y = gumbel_r.pdf(x, loc=mu, scale=beta)
    plt.plot(x, y, label=f'$\\\\mu={mu}, \\\\beta={beta}$', lw=2)

plt.title('Gumbel Distribution PDF')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

3. Implementation in Python and Parameter Estimation

We will look at how to fit the Gumbel distribution to actual data. Here we use scipy.stats.gumbel_r (_r stands for right-skewed, i.e., the standard maximum Gumbel distribution).

3.1 Data Generation and Fitting

First, we generate random numbers with known parameters, and then estimate the parameters from them to see if the original values can be recovered.

# True parameters
mu_true = 5.0
beta_true = 2.0

# Generate data (sample size=1000)
data = gumbel_r.rvs(loc=mu_true, scale=beta_true, size=1000, random_state=seed)

# Parameter estimation (Maximum Likelihood Estimation)
mu_est, beta_est = gumbel_r.fit(data)

print(f"True Parameters:      mu={mu_true}, beta={beta_true}")
print(f"Estimated Parameters: mu={mu_est:.4f}, beta={beta_est:.4f}")

True Parameters:      mu=5.0, beta=2.0
Estimated Parameters: mu=5.0035, beta=1.9955

3.2 Comparison of Histogram and Estimated Distribution

We plot the histogram of the data and the probability density function based on the estimated parameters together.

plt.figure(figsize=(10, 6))

# Histogram
plt.hist(data, bins=30, density=True, alpha=0.6, color='skyblue', label='Sample Data')

# Estimated distribution
x_range = np.linspace(data.min(), data.max(), 1000)
pdf_est = gumbel_r.pdf(x_range, loc=mu_est, scale=beta_est)
plt.plot(x_range, pdf_est, 'r-', lw=2, label=f'Fitted Gumbel\n($\\\\mu={mu_est:.2f}, \\\\beta={beta_est:.2f}$)')

plt.title('Fitting Gumbel Distribution to Data')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

3.3 Return Period and Return Level

Important concepts in extreme value analysis are “Return Period” and “Return Level”.

The return level $x_T$ corresponding to a return period of $T$ years (the maximum value that can occur once in $T$ years) can be calculated by the following formula: $$ x_T = \mu - \beta \ln\left(-\ln\left(1 - \frac{1}{T}\right)\right) $$

For example, let’s calculate the “once-in-a-century extreme value” for the data distribution above.

def calculate_return_level(mu, beta, T):
    return mu - beta * np.log(-np.log(1 - 1/T))

T_years = [10, 50, 100]

print("Return Levels:")
for T in T_years:
    level = calculate_return_level(mu_est, beta_est, T)
    print(f"{T} years: {level:.4f}")

Return Levels:
10 years: 9.4940
50 years: 12.7897
100 years: 14.1829

4. Application in Machine Learning: Gumbel-Max Trick

The Gumbel distribution plays an important role not only in the analysis of natural phenomena but also in the field of machine learning. A particularly famous one is the Gumbel-Max Trick.

This is a technique for efficiently sampling from a categorical distribution (discrete distribution).

When the probability of class $i$ is given by $p_i \propto \exp(s_i)$ (softmax form), a sample can be obtained by the following procedure:

For each class $i$, generate independent noise $g_i$ following the standard Gumbel distribution ($\mu=0, \beta=1$).
Choose the index where the value of the score plus noise $s_i + g_i$ is maximized. $$ k = \arg\max_i (s_i + g_i) $$

This $k$ becomes a sample following the probability $p_i$.

Furthermore, a relaxed version of this operation to make it differentiable is Gumbel-Softmax (Concrete Distribution), which is often used when dealing with discrete latent variables in VAEs (Variational Autoencoders) and the like.

# Simple verification of Gumbel-Max Trick
logits = np.array([1.0, 3.0, 2.0]) # Logits (scores) for each class
n_samples = 10000

# Generate standard Gumbel noise
gumbel_noise = np.random.gumbel(loc=0, scale=1, size=(n_samples, len(logits)))

# Take argmax
samples = np.argmax(logits + gumbel_noise, axis=1)

# Count frequencies
counts = np.bincount(samples)
probs_empirical = counts / n_samples

# Theoretical softmax probabilities
probs_theoretical = np.exp(logits) / np.sum(np.exp(logits))

print("Empirical probabilities:", probs_empirical)
print("Theoretical probabilities:", probs_theoretical)

Empirical probabilities: [0.0903 0.6693 0.2404]
Theoretical probabilities: [0.09003057 0.66524096 0.24472847]

Conclusion

In this article, we explained the following points about the Gumbel distribution:

Basics of Extreme Value Statistics: Its position as the distribution followed by maximum values.
Definition and Properties: Its characteristic right-skewed shape and main statistics.
Python Implementation: Parameter estimation and visualization using scipy.stats.gumbel_r.
Application: Calculation of return periods and the Gumbel-Max Trick in machine learning.

The Gumbel distribution is a powerful tool used in a surprisingly wide range of fields, from disaster prevention to AI model optimization.

References

Detailed explanation can be found in Chapter 1 of “Practical Bayesian Modeling” (Hideki Toyoda).

[Marketing] Gumbel Distribution: Basics of Extreme Value Statistics and Implementation in Python