Probability distributions and special functions with scipy
github
- The file in jupyter notebook format is here
google colaboratory
- To run it in google colaboratory here dist/dist_nb.ipynb)
Author’s environment
The author’s OS is macOS, and the options are different from Linux and Unix commands.
! sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G6020
Python -V
```python !
Python 3.7.3
Import the basic libraries and check their versions.
```python
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np
print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)
matplotlib version : 3.0.3
scipy version : 1.4.1
numpy version : 1.16.2
Basic probability distributions
This section describes the basic probability distributions and special functions needed to understand Bayesian statistics. This section summarizes the basic properties. It also shows the python code for sampling. Basically, we will just use the statistical functions in the scipy module.
Normal distribution
The normal distribution is the most commonly used probability distribution, and its details are described in detail in textbooks. The details are described in detail in the textbook.
formula
$ \displaystyle P\left( x | \mu, \sigma \right) = \frac{1}{\sqrt{2 \pi \sigma^2}}\exp \left( - \frac{(x-\mu)^2}{2 \sigma^2}\right) $
average
$ E\left[x \right]=\mu $
variance
$ V\left[x\right]=\sigma^2 $
Getting the probability density function
Here’s how to generate a probability density function for a normal distribution in python. It will be a graph with zero center and one standard deviation.
from scipy.stats import norm
mu = 0
sigma = 1
X = np.arange(-5, 5, 0.1)
Y = norm.pdf(X, mu, sigma)
plt.xlabel('$x$')
plt.ylabel('$P$')
plt.grid()
plt.plot(X, Y)
[<matplotlib.lines.Line2D at 0x112cdfa58>]
In the real world of quantitative data analysis, the probability density function itself is not used that much. In statistical modeling, for example, it is often done to virtually sample data from an assumed probability distribution. This is a kind of simulation of an assumed probability distribution.
Sampling
Let’s sample 1000 data from a standard normal distribution and display the histogram. You can see that we can reproduce the probability density function properly.
from scipy.stats import norm
mu = 0
sigma = 1
x = norm.rvs(mu, sigma, size=1000)
plt.grid()
plt.hist(x, bins=20)
(array([ 1., 1., 8., 20., 28., 52., 73., 104., 122., 132., 130,
107., 83., 62., 37., 24., 9., 3., 3., 1.]),
array([-3.22122581, -2.89149803, -2.56177025, -2.23204247, -1.90231468,
-1.5725869 , -1.24285912, -0.91313134, -0.58340355, -0.25367577,
0.07605201, 0.40577979, 0.73550758, 1.06523536, 1.39496314,
1.72469092, 2.05441871, 2.38414649, 2.71387427, 3.04360205,
3.37332984]),
<a list of 20 Patch objects>)
help
There are many more functions available in scipy.stats.norm.
norm?
You can run this to see what functions are available.
You may want to use rvs: random variates
, pdf : probability density function
, logpdf
, or cdf : cumulative distribution function
.
rvs(loc=0, scale=1, size=1, random_state=None)
Random variates.
pdf(x, loc=0, scale=1)
Probability density function.
logpdf(x, loc=0, scale=1)
Log of the probability density function.
cdf(x, loc=0, scale=1)
Cumulative distribution function.
logcdf(x, loc=0, scale=1)
Log of the cumulative distribution function.
sf(x, loc=0, scale=1)
Survival function (also defined as ``1 - cdf``, but `sf` is sometimes more accurate).
logsf(x, loc=0, scale=1)
Log of the survival function.
ppf(q, loc=0, scale=1)
Percent point function (inverse of ``cdf`` --- percentiles).
isf(q, loc=0, scale=1)
Inverse survival function (inverse of ``sf``).
moment(n, loc=0, scale=1)
Non-central moment of order n
stats(loc=0, scale=1, moments='mv')
Mean('m'), variance('v'), skew('s'), and/or kurtosis('k').
entropy(loc=0, scale=1)
(Differential) entropy of the RV.
fit(data, loc=0, scale=1)
Parameter estimates for generic data.
expect(func, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)
Expected value of a function (of one argument) with respect to the distribution.
median(loc=0, scale=1)
Median of the distribution.
Mean(loc=0, scale=1)
Mean of the distribution.
var(loc=0, scale=1)
Variance of the distribution.
std(loc=0, scale=1)
Standard deviation of the distribution.
interval(alpha, loc=0, scale=1)
Endpoints of the range that contains alpha percent of the distribution.
interval(alpha, loc=0, scale=1) Endpoints of the range that contains alpha percent of the distribution.
Bernoulli distribution
Determines the distribution for cases where there are only two possible outcomes of a trial, such as the distribution for a coin toss that yields either tails or tails.
formula
$$ P\left( x | \mu \right) = \mu^{x}\left(1-\mu \right)^{1-x} \quad \left(x = 0 \text{ or }1 \right) $$
Or
This becomes Note that $x$ can only take 0 or 1.
average
$ E\left[x \right]=\mu $
Variance
$ V\left[x\right]=\mu\left(1-\mu \right) $
stochastic mass function
from scipy.stats import bernoulli
mu=0.3
print(bernoulli.pmf(0, mu))
print(bernoulli.pmf(1, mu))
0.7000000000000001
0.3
Sampling
Let’s sample 1000 units and display them in a histogram.
from scipy.stats import bernoulli
mu=0.3
size=1000
x = bernoulli.rvs(mu, size=size)
plt.hist(x, bins=3)
(array([709., 0., 291.]),
array([0., 0.333333333, 0.666666667, 1.]),
<a list of 3 Patch objects>)
Binomial distribution
This is a distribution that follows the probability of the number of times a coin is tossed and comes up heads. When the number of times is 1, it matches the Bernoulli distribution. Where $n$ is the number of times you toss a coin, $p$ is the probability of getting a table, and $k$ is the number of times you get a table. For detailed calculations, please refer to a separate textbook.
tableform
$ \displaystyle P\left(k | n,p \right) = \frac{n!}{k!\left(n-k \right)!} p^{k}\left( 1-p\right)^{n-k} $
average
$ \displaystyle E\left[k \right] = np $
variance
$ \displaystyle V\left[k \right] = np\left(1-p \right) $
stochastic mass function
The probability mass function of the binomial distribution for $p=0.3, 0.5, 0.9$.
import numpy as np
import scipy
from scipy.stats import binom
import matplotlib.pyplot as plt
n = 100
x = np.arange(n)
# case.1
p = 0.3
y1 = binom.pmf(x,n,p)
# case.2
p = 0.5
y2 = binom.pmf(x,n,p)
# case.3
p = 0.9
y3 = binom.pmf(x,n,p)
# fig, ax = plt.subplots(facecolor="w")
plt.plot(x, y1, label="$p=0.3$")
plt.plot(x, y2, label="$p=0.5$")
plt.plot(x, y3, label="$p=0.9$")
plt.xlabel('$k$')
plt.ylabel('$B(n,p)$')
plt.title('binomial distribution n={}'.format(n))
plt.grid(True)
plt.legend()
<matplotlib.legend.Legend at 0x11b425668>
Sampling
Sampling from a binomial distribution with parameters $n=100, p=0.3$ to make sure the probability mass function is correct.
from scipy.stats import binom
n = 100
p = 0.3
x = binom.rvs(n,p,size=10000)
plt.xlim(0, 60)
plt.grid()
plt.hist(x, bins=15)
(array([1.100e+01, 7.900e+01, 2.070e+02, 8.220e+02, 1.134e+03, 2.335e+03,
1.673e+03, 2.117e+03, 8.170e+02, 5.900e+02, 1.370e+02, 6.900e+01,
7.000e+00, 0.000e+00, 2.000e+00]),
array([14. , 16.53333333, 19.0666666667, 21.6 , 24.1333333,
26.666666667, 29.2 , 31.73333333, 34.26666667, 36.8 ,
39.3333333, 41.8666666667, 44.4 , 46.93333333, 49.46666667,
52. ]),
<a list of 15 Patch objects>)
Multi-Neuclidean distribution (categorical distribution)
This is a multivariate version of the Bernoulli distribution. If the Bernoulli distribution expresses the probability distribution that the two sides of a coin appear, it expresses the probability distribution that the face of a dice appears. This corresponds to the $N=1$ case of the multinomial distribution described next.
tableform
$ \displaystyle P\left(x_1, x_2, \cdots x_n | n,p_1,p_2 \cdots p_n \right) = p_1^{x_1}p_2^{x_2}\cdots p_n^{x_n} $
However, $p_i$ and $x_i$ satisfy $ \displaystyle \sum_i p_i = 1 , \sum_i x_i = 1 $.
average
$ \displaystyle E\left[x_i \right] = p_i $
Variance
$ \displaystyle V\left[x_i \right] = p_i(1-p_i) $
Probability mass function
Since there doesn’t seem to be a dedicated function for the Martinoulli distribution in scipy, we’ll use the $N=1$ case of the multinomial distribution.
from scipy.stats import multinomial
rv = multinomial(1,[0.1,0.2,0.25,0.15,0.2,0.1])
print('Probability : ',str(rv.pmf([1,2,1,3,2,1]))[:8])
Probability : 0.0
Multinomial distribution
This is a multivariate version of the binomial distribution. It is a multivariate version of the binomial distribution, which is a probability distribution that follows the number of times each of the possible dice is rolled. Let there be $n$ faces of the dice, and the probability of each face coming up in a single attempt is $p_1,p_2, \cdots , p_n$, and the probability of each face coming up when the dice are rolled $N$ times is $x_1,x_2, \cdots , x_n$.
tableform
$ \displaystyle P\left(x_1, x_2, \cdots x_n | n,p_1,p_2 \cdots p_n \right) = \frac{N!}{x_1!x_2! \cdots x_n!} p_1^{x_1}p_2^{x_2}\cdots p_n^{x_n} $
However, $p_i$ and $x_i$ satisfy $ \displaystyle \sum_i p_i = 1 , \sum_i x_i = N $.
average
$ \displaystyle E\left[x_i \right] = Np_i $
Variance
$ \displaystyle V\left[x_i \right] = Np_i\left(1-p_i \right) $
Stochastic Mass Function
Suppose that the probability that a hexahedral dice rolls 1 to 6 is $0.1,0.2,0.25,0.15,0.2,01$. Roll the dice 10 times and calculate the probability that each of the eyes will come up once, twice, once, three times, twice, or once.
from scipy.stats import multinomial
rv = multinomial(10,[0.1,0.2,0.25,0.15,0.2,0.1])
print('Probability : ',str(rv.pmf([1,2,1,3,2,1]))[:8])
Probability : 0.002041
Sampling
Throw the above dice 10 times, and sample the number of times each eye comes up. For now, let’s sample only once.
multinomial.rvs(10, [0.1,0.2,0.25,0.15,0.2,0.1], size=1)
array([[0, 5, 2, 0, 1, 2]])
Try sampling 10000 times.
from collections import Counter
array = multinomial.rvs(10, [0.1,0.2,0.25,0.15,0.2,0.1], size=10000)
array = map(str, array)
print('Top 10 occurrences of data')
for i in Counter(array).most_common()[:10]:
print(i)
Top 10 occurrences of data
('[1 2 3 1 2 1]', 68)
(''[1 2 3 2 1 1]'', 47)
(''[1 3 3 1 2 0]'', 47)
('[0 2 3 2 2 1]', 46)
('[0 2 3 1 3 1]', 45)
('[1 2 2 1 2 2]', 45)
('[1 3 2 1 2 1]', 42)
('[1 2 3 2 2 0]', 42)
('[0 2 4 2 1 1]', 42)
('[1 2 4 1 2 0]', 40)
As a result, we can see that there is a large set of occurrences of the eye with the highest probability of 3 occurring.
Beta distribution
tableform
$ \displaystyle P\left( x | \alpha, \beta \right) = \frac{x^{\alpha - 1}(1-x)^{\beta - 1}}{B \left(\alpha, \beta \right)} $
Where $\displaystyle B(\alpha, \beta)$ is the beta function and $\alpha, \beta > 0$.
average
$ \displaystyle E\left[x \right] = \frac{\alpha}{\alpha + \beta} $
Variance
$ \displaystyle V\left[x \right] = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)} $
stochastic mass function
from scipy.stats import beta
alpha_list = [2,3,4].
beta_list = [2,2,3].
for _alpha,_beta in zip(alpha_list, beta_list):
x = np.linspace(0,1,100)[1:-1].
y = beta.pdf(x, _alpha, _beta)
plt.plot(x,y,label="$\alpha={}, \beta={}$".format(_alpha,_beta))
plt.xlabel("$x$")
plt.ylabel("$p(x)$")
plt.grid()
plt.legend()
<matplotlib.legend.Legend at 0x11b6ed208>
Sampling
We will sample and display a histogram for the beta function of $\alpha=2, \beta=2$ to make sure that the probability density function is indeed correct.
from scipy.stats import beta
_alpha = 2
_beta = 2
plt.grid()
plt.hist(beta.rvs(_alpha, _beta, size=100000))
(array([ 2835., 7639., 11106., 13515., 14798., 14875., 13482., 11280,
7557., 2913.]),
array([0.00161106, 0.10122898, 0.20084689, 0.30046481, 0.40008273,
0.49970065, 0.59931856, 0.69893648, 0.7985544 , 0.89817232,
0.99779023]),
<a list of 10 Patch objects>)
and the shape matches the beta function of $\alpha=2, \beta=2$ above.
Gamma distribution
tableform
$ \displaystyle P(x|\alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1}e^{-\beta x}}{\Gamma(\alpha)} $
Or
$ \displaystyle P(x|\alpha, \theta) = \frac{x^{\alpha-1}e^{-\frac{x}{\theta}}}{\Gamma(\alpha)}\theta^\alpha } $
Average
$ \displaystyle E\left[x \right] = \frac{\alpha}{\beta} $
Variance
$ \displaystyle V\left[x \right] = \frac{\alpha}{\beta^2} $
stochastic mass function
from scipy.stats import gamma
_alpha_list = [1.0, 2.0, 3.0, 9.0].
_beta_list = [2.0, 2.5, 4.5, 5].
for _alpha, _beta in zip(_alpha_list, _beta_list):
x = np.linspace(0,4,100)
y = gamma.pdf(x,_alpha,scale=1/_beta)
plt.plot(x,y,label="$\alpha = {}, \beta = {}$".format(_alpha, _beta))
plt.grid()
plt.legend()
<matplotlib.legend.Legend at 0x11ba66240>
Sampling
As with the beta distribution, we will sample and create a histogram to verify that the above probability density function is correct. Create a histogram for $\alpha=2.0, \beta=2.5$.
_alpha = 2.0
_alpha = 2.5
plt.grid()
plt.hist(gamma.rvs(_alpha, _beta, size=10000), bins=10)
(array([2.746e+03, 3.692e+03, 2.138e+03, 9.090e+02, 3.360e+02, 1.260e+02,
3.700e+01, 1.000e+01, 4.000e+00, 2.000e+00]),
array([ 5.05073076, 6.41202124, 7.77331172, 9.13460219, 10.49589267,
11.85718315, 13.21847363, 14.5797641 , 15.94105458, 17.30234506,
18.66363554]),
<a list of 10 Patch objects>)
Chi-square distribution
This is the distribution followed by the sum of squares of random variables following the standard normal distribution. To be honest, I don’t have much chance to use it, but I use it for interval estimation and chi-square test.
If $x_i \sim N(0,1)$ and
$$ \displaystyle Z = \sum_{i=1}^{k}x_i^2 $$.
This means that the distribution follows the
tableau
$\displaystyle f(x|k) = \frac{x^{\frac{k}{2}-1}e^{-\frac{x}{2}}}{2^{\frac{k}{2}}\Gamma\left(\frac{k}{2}\right)} (x \gt 0) $
Average
$ \displaystyle E\left[x \right] = k $
variance
$ \displaystyle V\left[x \right] = 2k $
stochastic mass function
from scipy.stats import chi2
k = 3.0
x = np.linspace(0, 20 ,100)
y = chi2.pdf(x, k)
plt.grid()
plt.plot(x, y)
plt.show()
Sampling
For the case of $k=3$, let’s create a histogram and see what the probability density function looks like.
plt.grid()
plt.hist(chi2.rvs(3,size=1000),bins=12)
plt.show()
Student’s t-distribution
It is often simply called the t-distribution. This distribution is also used for tests such as the t-test. When I was a salaried worker, I was introduced to this probability distribution by a senior colleague, and I remember using it for data analysis. It is applicable when the number of data to be obtained is small, and I think it can be used in manufacturing plants.
table-style
$ \displaystyle p(x | \nu) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\Gamma\left(\frac{\nu}{2}\right)}\left(1+\frac{x^2}{\nu}\right)^{-\left(\frac{\nu+1}{2}\right)} $
average
$ \displaystyle E\left[x \right] = 0 \cdots (k > 1) $
Variance
$ \displaystyle V\left[x \right] = \frac{k}{k-2}\cdots (k > 2) $
stochastic mass function
from scipy.stats import t
nu_list = [1,3,5].
plt.grid()
for nu in nu_list:
x = np.linspace(-5,5,100)
y = t.pdf(x,nu)
plt.plot(x,y,label="$\\nu = {}$".format(nu))
plt.title("t-distribution")
plt.legend()
plt.show()
Sampling
Let’s sample from the probability density function and display the histogram for the case $\nu = 3$.
from scipy.stats import t
nu = 3
plt.grid()
plt.xlim(-10,10)
plt.hist(t.rvs(nu, size=10000), bins=80)
plt.show()
Multivariate Gaussian distribution
In statistical modeling, we sometimes sample from a correlated multivariate Gaussian distribution.
$$ \fnDroidName P(x|\mu, \Sigma) = \frac{1}{\sqrt{(2 \pi)^{k}\det\Sigma}}\exp \left( - \frac{(x-\mu)^{T}\Sigma^{-1}(x-\mu)}{2}\right) $$
The $\Sigma$ is the covariance matrix and $\mu$ is the mean of each random variable, or a vector if there are more than two dimensions.
By manipulating $\Sigma$, we can set the correlation between variables.
$$ \Sigma = \left( \begin{array}{cc} \sigma_{x_1}^2 & \sigma_{x_1x_2} \\ \sigma_{x_1x_2} & \sigma_{x_2}^2 \end{array} \right) $$
The covariance matrix satisfies the conditions of a semi-positive definite symmetric matrix and has some useful properties such as all eigenvalues are non-negative. It is a matrix with properties that come up often in numerical calculations.
Perfectly uncorrelated plot
When $\sigma_{x_1x_2}=0$, there is no correlation between the two random variables. For clarity, we show the graph from the z-axis.
from scipy.stats import multivariate_normal
mu = np.array([0,0])
sigma = np.array([[1,0],[0,1]])
x1 = np.linspace(-3,3,100)
x2 = np.linspace(-3,3,100)
X = np.meshgrid(x1,x2)
X1, X2 = np.meshgrid(x1, x2)
X = np.c_[np.travel(X1), np.travel(X2)].
Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(elev=-90, azim=0)
surf = ax.plot_surface(X1, X2, Z, cmap='bwr', linewidth=0)
fig.show()
/Users/hiroshi/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py:445: UserWarning: Matplotlib is currently using module:// ipykernel.pylab.backend_inline, which is a non-GUI backend, so cannot show the figure.
% get_backend())
Gaussian distribution with positive correlation
Graph of a positively correlated binary Gaussian distribution with $\sigma_{x_1x_2} = 0.8$.
from scipy.stats import multivariate_normal
mu = np.array([0,0])
sigma = np.array([[1,0.8],[0.8,1]])
x1 = np.linspace(-3,3,100)
x2 = np.linspace(-3,3,100)
X = np.meshgrid(x1,x2)
X1, X2 = np.meshgrid(x1, x2)
X = np.c_[np.travel(X1), np.travel(X2)].
Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(elev=-90, azim=0)
surf = ax.plot_surface(X1, X2, Z, cmap='bwr', linewidth=0)
fig.show()
Gaussian distribution with negative correlation
Graph of a two-way Gaussian distribution with negative correlation of $\sigma_{x_1x_2} = -0.8$
from scipy.stats import multivariate_normal
mu = np.array([0,0])
sigma = np.array([[1,-0.8],[-0.8,1]])
x1 = np.linspace(-3,3,100)
x2 = np.linspace(-3,3,100)
X = np.meshgrid(x1,x2)
X1, X2 = np.meshgrid(x1, x2)
X = np.c_[np.travel(X1), np.travel(X2)].
Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(elev=-90, azim=0)
surf = ax.plot_surface(X1, X2, Z, cmap='bwr', linewidth=0)
fig.show()
Exponential distribution families
There are a wide variety of probability distributions shown above that come up in data analysis and marketing analysis, such as normal distribution, binomial distribution, Poisson distribution, etc. Most of them have a form known as exponential type distribution family.
tableform
$ \displaystyle f(x|\theta) = h(x)\exp (\eta(\theta)\cdot T(x) - A(\theta)) $
A distribution function of this form has important properties such as having a covariance prior. I will write about the details when the opportunity arises.