## github

• The file in jupyter notebook format is here

• To run it in google colaboratory here dist/dist_nb.ipynb)

### Author’s environment

The author’s OS is macOS, and the options are different from Linux and Unix commands.

! sw_vers

ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G6020

Python -V
python !

Python 3.7.3

Import the basic libraries and check their versions.

python
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np

print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)

matplotlib version : 3.0.3
scipy version : 1.4.1
numpy version : 1.16.2


## Basic probability distributions

This section describes the basic probability distributions and special functions needed to understand Bayesian statistics. This section summarizes the basic properties. It also shows the python code for sampling. Basically, we will just use the statistical functions in the scipy module.

### Normal distribution

The normal distribution is the most commonly used probability distribution, and its details are described in detail in textbooks. The details are described in detail in the textbook.

#### formula

$\displaystyle P\left( x | \mu, \sigma \right) = \frac{1}{\sqrt{2 \pi \sigma^2}}\exp \left( - \frac{(x-\mu)^2}{2 \sigma^2}\right)$

#### average

$E\left[x \right]=\mu$

#### variance

$V\left[x\right]=\sigma^2$

#### Getting the probability density function

Here’s how to generate a probability density function for a normal distribution in python. It will be a graph with zero center and one standard deviation.

from scipy.stats import norm

mu = 0
sigma = 1

X = np.arange(-5, 5, 0.1)
Y = norm.pdf(X, mu, sigma)

plt.xlabel('$x$')
plt.ylabel('$P$')
plt.grid()
plt.plot(X, Y)

[<matplotlib.lines.Line2D at 0x112cdfa58>]


In the real world of quantitative data analysis, the probability density function itself is not used that much. In statistical modeling, for example, it is often done to virtually sample data from an assumed probability distribution. This is a kind of simulation of an assumed probability distribution.

#### Sampling

Let’s sample 1000 data from a standard normal distribution and display the histogram. You can see that we can reproduce the probability density function properly.

from scipy.stats import norm

mu = 0
sigma = 1

x = norm.rvs(mu, sigma, size=1000)
plt.grid()
plt.hist(x, bins=20)

(array([ 1., 1., 8., 20., 28., 52., 73., 104., 122., 132., 130,
107., 83., 62., 37., 24., 9., 3., 3., 1.]),
array([-3.22122581, -2.89149803, -2.56177025, -2.23204247, -1.90231468,
-1.5725869 , -1.24285912, -0.91313134, -0.58340355, -0.25367577,
0.07605201, 0.40577979, 0.73550758, 1.06523536, 1.39496314,
1.72469092, 2.05441871, 2.38414649, 2.71387427, 3.04360205,
3.37332984]),
<a list of 20 Patch objects>)


#### help

There are many more functions available in scipy.stats.norm.

norm?


You can run this to see what functions are available. You may want to use rvs: random variates, pdf : probability density function, logpdf, or cdf : cumulative distribution function.


rvs(loc=0, scale=1, size=1, random_state=None)
Random variates.
pdf(x, loc=0, scale=1)
Probability density function.
logpdf(x, loc=0, scale=1)
Log of the probability density function.
cdf(x, loc=0, scale=1)
Cumulative distribution function.
logcdf(x, loc=0, scale=1)
Log of the cumulative distribution function.
sf(x, loc=0, scale=1)
Survival function (also defined as 1 - cdf, but sf is sometimes more accurate).
logsf(x, loc=0, scale=1)
Log of the survival function.
ppf(q, loc=0, scale=1)
Percent point function (inverse of cdf --- percentiles).
isf(q, loc=0, scale=1)
Inverse survival function (inverse of sf).
moment(n, loc=0, scale=1)
Non-central moment of order n
stats(loc=0, scale=1, moments='mv')
Mean('m'), variance('v'), skew('s'), and/or kurtosis('k').
entropy(loc=0, scale=1)
(Differential) entropy of the RV.
fit(data, loc=0, scale=1)
Parameter estimates for generic data.
expect(func, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)
Expected value of a function (of one argument) with respect to the distribution.
median(loc=0, scale=1)
Median of the distribution.
Mean(loc=0, scale=1)
Mean of the distribution.
var(loc=0, scale=1)
Variance of the distribution.
std(loc=0, scale=1)
Standard deviation of the distribution.
interval(alpha, loc=0, scale=1)
Endpoints of the range that contains alpha percent of the distribution.

interval(alpha, loc=0, scale=1) Endpoints of the range that contains alpha percent of the distribution.


### Bernoulli distribution

Determines the distribution for cases where there are only two possible outcomes of a trial, such as the distribution for a coin toss that yields either tails or tails.

#### formula

$$P\left( x | \mu \right) = \mu^{x}\left(1-\mu \right)^{1-x} \quad \left(x = 0 \text{ or }1 \right)$$

Or

$$P\left(x | \mu \right)= \begin{cases} 1 - \\mu &\text{if } x = 0 \\ \mu &\text{if } x = 1 \blur1}$$

This becomes Note that $x$ can only take 0 or 1.

#### average

$E\left[x \right]=\mu$

#### Variance

$V\left[x\right]=\mu\left(1-\mu \right)$

#### stochastic mass function

from scipy.stats import bernoulli

mu=0.3

print(bernoulli.pmf(0, mu))
print(bernoulli.pmf(1, mu))

0.7000000000000001
0.3


#### Sampling

Let’s sample 1000 units and display them in a histogram.

from scipy.stats import bernoulli

mu=0.3
size=1000

x = bernoulli.rvs(mu, size=size)
plt.hist(x, bins=3)

(array([709., 0., 291.]),
array([0., 0.333333333, 0.666666667, 1.]),
<a list of 3 Patch objects>)


### Binomial distribution

This is a distribution that follows the probability of the number of times a coin is tossed and comes up heads. When the number of times is 1, it matches the Bernoulli distribution. Where $n$ is the number of times you toss a coin, $p$ is the probability of getting a table, and $k$ is the number of times you get a table. For detailed calculations, please refer to a separate textbook.

#### tableform

$\displaystyle P\left(k | n,p \right) = \frac{n!}{k!\left(n-k \right)!} p^{k}\left( 1-p\right)^{n-k}$

#### average

$\displaystyle E\left[k \right] = np$

#### variance

$\displaystyle V\left[k \right] = np\left(1-p \right)$

#### stochastic mass function

The probability mass function of the binomial distribution for $p=0.3, 0.5, 0.9$.

import numpy as np
import scipy
from scipy.stats import binom
import matplotlib.pyplot as plt

n = 100
x = np.arange(n)

# case.1
p = 0.3
y1 = binom.pmf(x,n,p)

# case.2
p = 0.5
y2 = binom.pmf(x,n,p)

# case.3
p = 0.9
y3 = binom.pmf(x,n,p)

# fig, ax = plt.subplots(facecolor="w")
plt.plot(x, y1, label="$p=0.3$")
plt.plot(x, y2, label="$p=0.5$")
plt.plot(x, y3, label="$p=0.9$")

plt.xlabel('$k$')
plt.ylabel('$B(n,p)$')
plt.title('binomial distribution n={}'.format(n))
plt.grid(True)

plt.legend()

<matplotlib.legend.Legend at 0x11b425668>


#### Sampling

Sampling from a binomial distribution with parameters $n=100, p=0.3$ to make sure the probability mass function is correct.

from scipy.stats import binom

n = 100
p = 0.3

x = binom.rvs(n,p,size=10000)

plt.xlim(0, 60)
plt.grid()
plt.hist(x, bins=15)

(array([1.100e+01, 7.900e+01, 2.070e+02, 8.220e+02, 1.134e+03, 2.335e+03,
1.673e+03, 2.117e+03, 8.170e+02, 5.900e+02, 1.370e+02, 6.900e+01,
7.000e+00, 0.000e+00, 2.000e+00]),
array([14. , 16.53333333, 19.0666666667, 21.6 , 24.1333333,
26.666666667, 29.2 , 31.73333333, 34.26666667, 36.8 ,
39.3333333, 41.8666666667, 44.4 , 46.93333333, 49.46666667,
52. ]),
<a list of 15 Patch objects>)


### Multi-Neuclidean distribution (categorical distribution)

This is a multivariate version of the Bernoulli distribution. If the Bernoulli distribution expresses the probability distribution that the two sides of a coin appear, it expresses the probability distribution that the face of a dice appears. This corresponds to the $N=1$ case of the multinomial distribution described next.

#### tableform

$\displaystyle P\left(x_1, x_2, \cdots x_n | n,p_1,p_2 \cdots p_n \right) = p_1^{x_1}p_2^{x_2}\cdots p_n^{x_n}$

However, $p_i$ and $x_i$ satisfy $\displaystyle \sum_i p_i = 1 , \sum_i x_i = 1$.

#### average

$\displaystyle E\left[x_i \right] = p_i$

#### Variance

$\displaystyle V\left[x_i \right] = p_i(1-p_i)$

#### Probability mass function

Since there doesn’t seem to be a dedicated function for the Martinoulli distribution in scipy, we’ll use the $N=1$ case of the multinomial distribution.

from scipy.stats import multinomial

rv = multinomial(1,[0.1,0.2,0.25,0.15,0.2,0.1])
print('Probability : ',str(rv.pmf([1,2,1,3,2,1]))[:8])

Probability : 0.0


### Multinomial distribution

This is a multivariate version of the binomial distribution. It is a multivariate version of the binomial distribution, which is a probability distribution that follows the number of times each of the possible dice is rolled. Let there be $n$ faces of the dice, and the probability of each face coming up in a single attempt is $p_1,p_2, \cdots , p_n$, and the probability of each face coming up when the dice are rolled $N$ times is $x_1,x_2, \cdots , x_n$.

#### tableform

$\displaystyle P\left(x_1, x_2, \cdots x_n | n,p_1,p_2 \cdots p_n \right) = \frac{N!}{x_1!x_2! \cdots x_n!} p_1^{x_1}p_2^{x_2}\cdots p_n^{x_n}$

However, $p_i$ and $x_i$ satisfy $\displaystyle \sum_i p_i = 1 , \sum_i x_i = N$.

#### average

$\displaystyle E\left[x_i \right] = Np_i$

#### Variance

$\displaystyle V\left[x_i \right] = Np_i\left(1-p_i \right)$

#### Stochastic Mass Function

Suppose that the probability that a hexahedral dice rolls 1 to 6 is $0.1,0.2,0.25,0.15,0.2,01$. Roll the dice 10 times and calculate the probability that each of the eyes will come up once, twice, once, three times, twice, or once.

from scipy.stats import multinomial

rv = multinomial(10,[0.1,0.2,0.25,0.15,0.2,0.1])
print('Probability : ',str(rv.pmf([1,2,1,3,2,1]))[:8])

Probability : 0.002041


#### Sampling

Throw the above dice 10 times, and sample the number of times each eye comes up. For now, let’s sample only once.

multinomial.rvs(10, [0.1,0.2,0.25,0.15,0.2,0.1], size=1)

array([[0, 5, 2, 0, 1, 2]])


Try sampling 10000 times.

from collections import Counter

array = multinomial.rvs(10, [0.1,0.2,0.25,0.15,0.2,0.1], size=10000)
array = map(str, array)

print('Top 10 occurrences of data')
for i in Counter(array).most_common()[:10]:
print(i)

Top 10 occurrences of data
('[1 2 3 1 2 1]', 68)
(''[1 2 3 2 1 1]'', 47)
(''[1 3 3 1 2 0]'', 47)
('[0 2 3 2 2 1]', 46)
('[0 2 3 1 3 1]', 45)
('[1 2 2 1 2 2]', 45)
('[1 3 2 1 2 1]', 42)
('[1 2 3 2 2 0]', 42)
('[0 2 4 2 1 1]', 42)
('[1 2 4 1 2 0]', 40)


As a result, we can see that there is a large set of occurrences of the eye with the highest probability of 3 occurring.

### Beta distribution

#### tableform

$\displaystyle P\left( x | \alpha, \beta \right) = \frac{x^{\alpha - 1}(1-x)^{\beta - 1}}{B \left(\alpha, \beta \right)}$

Where $\displaystyle B(\alpha, \beta)$ is the beta function and $\alpha, \beta > 0$.

#### average

$\displaystyle E\left[x \right] = \frac{\alpha}{\alpha + \beta}$

#### Variance

$\displaystyle V\left[x \right] = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}$

#### stochastic mass function

from scipy.stats import beta

alpha_list = [2,3,4].
beta_list = [2,2,3].

for _alpha,_beta in zip(alpha_list, beta_list):
x = np.linspace(0,1,100)[1:-1].
y = beta.pdf(x, _alpha, _beta)
plt.plot(x,y,label="$\alpha={}, \beta={}$".format(_alpha,_beta))

plt.xlabel("$x$")
plt.ylabel("$p(x)$")
plt.grid()
plt.legend()

<matplotlib.legend.Legend at 0x11b6ed208>


#### Sampling

We will sample and display a histogram for the beta function of $\alpha=2, \beta=2$ to make sure that the probability density function is indeed correct.

from scipy.stats import beta

_alpha = 2
_beta = 2

plt.grid()
plt.hist(beta.rvs(_alpha, _beta, size=100000))

(array([ 2835., 7639., 11106., 13515., 14798., 14875., 13482., 11280,
7557., 2913.]),
array([0.00161106, 0.10122898, 0.20084689, 0.30046481, 0.40008273,
0.49970065, 0.59931856, 0.69893648, 0.7985544 , 0.89817232,
0.99779023]),
<a list of 10 Patch objects>)


and the shape matches the beta function of $\alpha=2, \beta=2$ above.

### Gamma distribution

#### tableform

$\displaystyle P(x|\alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1}e^{-\beta x}}{\Gamma(\alpha)}$

Or

$\displaystyle P(x|\alpha, \theta) = \frac{x^{\alpha-1}e^{-\frac{x}{\theta}}}{\Gamma(\alpha)}\theta^\alpha }$

#### Average

$\displaystyle E\left[x \right] = \frac{\alpha}{\beta}$

#### Variance

$\displaystyle V\left[x \right] = \frac{\alpha}{\beta^2}$

#### stochastic mass function

from scipy.stats import gamma

_alpha_list = [1.0, 2.0, 3.0, 9.0].
_beta_list = [2.0, 2.5, 4.5, 5].

for _alpha, _beta in zip(_alpha_list, _beta_list):
x = np.linspace(0,4,100)
y = gamma.pdf(x,_alpha,scale=1/_beta)
plt.plot(x,y,label="$\alpha = {}, \beta = {}$".format(_alpha, _beta))

plt.grid()
plt.legend()

<matplotlib.legend.Legend at 0x11ba66240>


#### Sampling

As with the beta distribution, we will sample and create a histogram to verify that the above probability density function is correct. Create a histogram for $\alpha=2.0, \beta=2.5$.

_alpha = 2.0
_alpha = 2.5

plt.grid()
plt.hist(gamma.rvs(_alpha, _beta, size=10000), bins=10)

(array([2.746e+03, 3.692e+03, 2.138e+03, 9.090e+02, 3.360e+02, 1.260e+02,
3.700e+01, 1.000e+01, 4.000e+00, 2.000e+00]),
array([ 5.05073076, 6.41202124, 7.77331172, 9.13460219, 10.49589267,
11.85718315, 13.21847363, 14.5797641 , 15.94105458, 17.30234506,
18.66363554]),
<a list of 10 Patch objects>)


### Chi-square distribution

This is the distribution followed by the sum of squares of random variables following the standard normal distribution. To be honest, I don’t have much chance to use it, but I use it for interval estimation and chi-square test.

If $x_i \sim N(0,1)$ and

$$\displaystyle Z = \sum_{i=1}^{k}x_i^2$$.

This means that the distribution follows the

#### tableau

$\displaystyle f(x|k) = \frac{x^{\frac{k}{2}-1}e^{-\frac{x}{2}}}{2^{\frac{k}{2}}\Gamma\left(\frac{k}{2}\right)} (x \gt 0)$

#### Average

$\displaystyle E\left[x \right] = k$

#### variance

$\displaystyle V\left[x \right] = 2k$

#### stochastic mass function

from scipy.stats import chi2

k = 3.0

x = np.linspace(0, 20 ,100)
y = chi2.pdf(x, k)

plt.grid()
plt.plot(x, y)
plt.show()


#### Sampling

For the case of $k=3$, let’s create a histogram and see what the probability density function looks like.

plt.grid()
plt.hist(chi2.rvs(3,size=1000),bins=12)
plt.show()


### Student’s t-distribution

It is often simply called the t-distribution. This distribution is also used for tests such as the t-test. When I was a salaried worker, I was introduced to this probability distribution by a senior colleague, and I remember using it for data analysis. It is applicable when the number of data to be obtained is small, and I think it can be used in manufacturing plants.

#### table-style

$\displaystyle p(x | \nu) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\Gamma\left(\frac{\nu}{2}\right)}\left(1+\frac{x^2}{\nu}\right)^{-\left(\frac{\nu+1}{2}\right)}$

#### average

$\displaystyle E\left[x \right] = 0 \cdots (k > 1)$

#### Variance

$\displaystyle V\left[x \right] = \frac{k}{k-2}\cdots (k > 2)$

#### stochastic mass function

from scipy.stats import t

nu_list = [1,3,5].

plt.grid()

for nu in nu_list:

x = np.linspace(-5,5,100)
y = t.pdf(x,nu)

plt.plot(x,y,label="$\\nu = {}$".format(nu))

plt.title("t-distribution")
plt.legend()
plt.show()


#### Sampling

Let’s sample from the probability density function and display the histogram for the case $\nu = 3$.

from scipy.stats import t

nu = 3

plt.grid()
plt.xlim(-10,10)
plt.hist(t.rvs(nu, size=10000), bins=80)
plt.show()


## Multivariate Gaussian distribution

In statistical modeling, we sometimes sample from a correlated multivariate Gaussian distribution.

$$\fnDroidName P(x|\mu, \Sigma) = \frac{1}{\sqrt{(2 \pi)^{k}\det\Sigma}}\exp \left( - \frac{(x-\mu)^{T}\Sigma^{-1}(x-\mu)}{2}\right)$$

The $\Sigma$ is the covariance matrix and $\mu$ is the mean of each random variable, or a vector if there are more than two dimensions.

By manipulating $\Sigma$, we can set the correlation between variables.

$$\Sigma = \left( \begin{array}{cc} \sigma_{x_1}^2 & \sigma_{x_1x_2} \\ \sigma_{x_1x_2} & \sigma_{x_2}^2 \end{array} \right)$$

The covariance matrix satisfies the conditions of a semi-positive definite symmetric matrix and has some useful properties such as all eigenvalues are non-negative. It is a matrix with properties that come up often in numerical calculations.

### Perfectly uncorrelated plot

When $\sigma_{x_1x_2}=0$, there is no correlation between the two random variables. For clarity, we show the graph from the z-axis.

from scipy.stats import multivariate_normal

mu = np.array([0,0])
sigma = np.array([[1,0],[0,1]])

x1 = np.linspace(-3,3,100)
x2 = np.linspace(-3,3,100)

X = np.meshgrid(x1,x2)

X1, X2 = np.meshgrid(x1, x2)
X = np.c_[np.travel(X1), np.travel(X2)].
Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax.view_init(elev=-90, azim=0)
surf = ax.plot_surface(X1, X2, Z, cmap='bwr', linewidth=0)
fig.show()

/Users/hiroshi/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py:445: UserWarning: Matplotlib is currently using module:// ipykernel.pylab.backend_inline, which is a non-GUI backend, so cannot show the figure.
% get_backend())


### Gaussian distribution with positive correlation

Graph of a positively correlated binary Gaussian distribution with $\sigma_{x_1x_2} = 0.8$.

from scipy.stats import multivariate_normal

mu = np.array([0,0])
sigma = np.array([[1,0.8],[0.8,1]])

x1 = np.linspace(-3,3,100)
x2 = np.linspace(-3,3,100)

X = np.meshgrid(x1,x2)

X1, X2 = np.meshgrid(x1, x2)
X = np.c_[np.travel(X1), np.travel(X2)].
Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax.view_init(elev=-90, azim=0)
surf = ax.plot_surface(X1, X2, Z, cmap='bwr', linewidth=0)
fig.show()


### Gaussian distribution with negative correlation

Graph of a two-way Gaussian distribution with negative correlation of $\sigma_{x_1x_2} = -0.8$

from scipy.stats import multivariate_normal

mu = np.array([0,0])
sigma = np.array([[1,-0.8],[-0.8,1]])

x1 = np.linspace(-3,3,100)
x2 = np.linspace(-3,3,100)

X = np.meshgrid(x1,x2)

X1, X2 = np.meshgrid(x1, x2)
X = np.c_[np.travel(X1), np.travel(X2)].
Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax.view_init(elev=-90, azim=0)
surf = ax.plot_surface(X1, X2, Z, cmap='bwr', linewidth=0)
fig.show()


## Exponential distribution families

There are a wide variety of probability distributions shown above that come up in data analysis and marketing analysis, such as normal distribution, binomial distribution, Poisson distribution, etc. Most of them have a form known as exponential type distribution family.

#### tableform

$\displaystyle f(x|\theta) = h(x)\exp (\eta(\theta)\cdot T(x) - A(\theta))$

A distribution function of this form has important properties such as having a covariance prior. I will write about the details when the opportunity arises.