Preparation
In textbooks, calculations are mainly performed by Excel functions. Although Excel has an excellent GUI, it does not have enough API libraries to connect to external web systems or data analysis tools. Therefore, we will use Python to perform the same calculations as in the textbook. Here are the preparations for this.
github
- The jupyter notebook file on github is here .
google colaboratory
- If you want to run it on google colaboratory here
Author’s environment
This is the author’s environment.
The author's environment.
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G95
Python -V
Python 3.5.5 :: Anaconda, Inc.
Load the required libraries.
import numpy as np
import scipy
from scipy.stats import binom
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
print("numpy version :", np.__version__)
print("matplotlib version :", matplotlib.__version__)
print("sns version :",sns.__version__)
numpy version : 1.18.1
matplotlib version : 2.2.2
sns version : 0.8.1
Overview
We live in a limited time, and all we can know is the mean and its variance (standard deviation). For example, in the marketing world, the number of visitors per day, the number of products sold per day, etc. So what can we learn from that limited information? If the number of visitors today is 1,000, what is the probability that the number of visitors tomorrow will be 500? And what is the probability that the number of visitors will be 1500? The answer to these questions can be found in the probability distribution.
In “Strategic Theory of Probability Thinking,” the numbers corresponding to the mean and standard deviation that determine the shape of the probability distribution can be expressed by two parameters, $M$ and $K$. $M$ is the consumer preference itself, and $K$ is a function of $M$. The book consistently asserts the following
Preference controls a brand’s market share, penetration rate, and number of purchases. There are three reasons for this (copied from a textbook): 1.
- preferences are in the minds of consumers and govern their buying behavior. The direct evidence is that the BP-10 Share Model, based on consumer preferences, can predict the actual share with relatively high accuracy. Consumer preferences dominate market share and dominate sales. In other words, given 100% awareness, 100% distribution, and enough time, preference and unit share are the same thing. The preference is in the mind of the consumer, and the unit share is the actual manifestation of that preference. 2. With the negative binomial distribution model, the penetration rate and frequency distribution of categories and brands can be predicted very accurately and close to reality with only two parameters, M and K. Both $M$ and $K$ are functions of preferences. 3. Using the categories M and K, unit share, and Derisleigh S as inputs, the Derisleigh NBD model can accurately predict very realistic penetration and frequency distributions for each brand. It can also accurately predict the switching between brands. Both Delisleigh $S$ and $K$ are functions of preferences.
In this site, we follow the “Strategic Theory of Probability Thinking”.
- binomial distribution
- poisson distribution
- negative binomial distribution
- summary of Poisson distribution and negative binomial distribution
- key equations governing sales
- Delishley NBD Model
The explanation will follow the order of.
1-1. Binomial Distribution (Binomicl Distribution)
1. Binomial Distribution Formula
Binomial distribution is a probability distribution in which $\displaystyle N$ trials with success probability $\displaystyle p$ are conducted and the number of successes $r$ is used as a random variable. In general, it is defined as a probability mass function as shown below. The reason why it is a probability mass function instead of a probability density function used to explain normal distribution is that $r$ is a discrete value that can only take positive integers.
$$\frac{N!}{r! (N-r)!} \times p^r \times \left(1-p\right)^{N-r}\cdot \cdot \left(1\right) $$
In this chapter, the binomial distribution is explained using a lottery as an example. Suppose there are a total of $n$ lotteries, and $\theta$ of them are winners. Then the probability of getting a winning lottery ticket on the first draw is $\displaystyle \frac{\theta}{n}$. The probability of getting a bad lottery ticket is $\displaystyle 1-\frac{\theta}{n}$. Since the number of times you win the lottery is $r$ and the number of times you lose the lottery is $\displaystyle N-r$, for example, the probability of winning the lottery the first $r$ times in a row and then losing the lottery $1-r$ times in a row is
$$ \left(\frac{\theta}{n}\right)^r \times \left(\frac{n-\theta}{n}\right)^{N-r} \cdot \cdot \cdot \cdot \left(2\right) $$
This becomes Now we need to think about the combination. As we learned in high school math, the combination is the probability of hitting a prize $r$ times out of $N$, so $\displaystyle {}_n \mathrm{C}_r = \frac{N!}{r! (N-r)!} $ and
$$ \frac{N!}{r! (N-r)!} \times \left(\frac{\theta}{n}\right)^r \times \left(\frac{n-\theta}{n}\right)^{N-r}\cdot \cdot \cdot \cdot \left(3\right) $$ The result is.
2. Example of python calculation
x = np.arange(100)
n = 100
p = 0.3
mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
print("Mean : ", mean)
print("Standard deviation :", var)
plt.xlabel('$r$')
plt.xlabel('$r$')
plt.ylabel('$B(n,p)$')
plt.title('binomial distribution n={}, p={}'.format(n,p))
plt.grid(True)
y = binom.pmf(x,n,p)
plt.scatter(x,y)
# sns.plot(x=x, y=y)
# sns.scatterplot(data=tips, x='total_bill', y='tip')
plt.show()
Mean : 30.0
Standard deviation : 21.0
1-2. Poisson Distribution
Meaning of Poisson distribution
The Poisson distribution is a distribution that follows the number of random events that occur $\mu$ times per unit period. The Poisson distribution has only one parameter, this $\mu$. The formula is
$$P\left(r|\mu\right) = \frac{\mu^r}{r!}e^{-\mu}$$.
This is a good example. The following are the results of calculations for each of the cases $r=0,1,2,3,4$ for $\mu = 0.6$, following the instructions in this book.
.$r$ | 0 | 1 | 2 | 23 | 34 | 4.
---|---|---|---|---|---|
Probability | 54.88% | 32.92% | 9.88% | 1.98% | 0.30% |
The python code used for the graph and calculations is shown below.
x = np.arange(100)
n = 100
p = 0.3
mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
print("Mean : ", mean)
print("Standard deviation :", var)
plt.xlabel('$r$')
plt.xlabel('$r$')
plt.ylabel('$B(n,p)$')
plt.title('binomial distribution n={}, p={}'.format(n,p))
plt.grid(True)
y = binom.pmf(x,n,p)
plt.scatter(x,y)
# sns.plot(x=x, y=y)
# sns.scatterplot(data=tips, x='total_bill', y='tip')
plt.show()
Mean : 30.0
Standard deviation : 21.0
1-2. Poisson Distribution
Meaning of Poisson distribution
The Poisson distribution is a distribution that follows the number of random events that occur $\mu$ times per unit period. The Poisson distribution has only one parameter, this $\mu$. The formula is
$$P\left(r|\mu\right) = \frac{\mu^r}{r!}e^{-\mu}$$.
This is a good example. The following are the results of calculations for each of the cases $r=0,1,2,3,4$ for $\mu = 0.6$, following the instructions in this book.
.$r$ | 0 | 1 | 2 | 23 | 34 | 4.
---|---|---|---|---|---|
Probability | 54.88% | 32.92% | 9.88% | 1.98% | 0.30% |
The python code used for the graph and calculations is shown below.
from scipy.stats import poisson
x = np.arange(10)
mu = 0.6
mean, var, skew, kurt = poisson.stats(mu, moments='mvsk')
print("Mean : ", mean)
print("Standard deviation :", var)
y = poisson.pmf(x,mu)
plt.xlabel('$r$')
plt.ylabel('$P(r|\mu)$')
plt.title('Poisson distribution mu=%.1f' % (mu))
plt.grid(True)
plt.plot(x,y)
plt.show()
Mean : 0.6
Standard deviation : 0.6
1-3. Negative Binomial Distribution
In this chapter, the first important conclusion is written.
The purchasing behavior of individual consumers is Poisson distributed, but when we look at consumers as a whole, we see a "negative binomial distribution.
There is no description of why the distribution is negative binomial when looking at the whole consumer, and the discussion proceeds assuming a negative binomial distribution. However, as I mentioned in the previous chapter, at the bottom of p. 254 of the summary of “Poisson distribution” and “negative binomial distribution” in 1-4.
- (A) Poisson distribution at the individual level
- (ii) The long-run mean $\mu $ is gamma-distributed when viewed as a whole consumer.
In addition, it says
- Remember that when these two assumptions hold, the actual purchase probability for a given period of time is negatively binomially distributed for the consumer as a whole.
In other words, the actual purchase probability for the entire consumer population is negative binomial. In other words, the negative binomial distribution when viewed across consumers is only the result of (a) and (i).
As a result, the probability that a certain category or a certain brand is chosen by the entire consumer is
$$ P\left(r \right) = \frac{\left(1 + \frac{M}{K}\right)^{-K}\cdot \Gamma\left(K + r \right)}{\Gamma\left(r + 1 \right)}\cdot \Gamma\left(K \right) \cdot \left(\frac{M}{M+K} \right)^r \cdots \left(1 \right) $$
which can be calculated as Then, it is proved that the negative binomial distribution leads to $\left(1 \right)$ by assuming the gamma distribution as “the distribution where success calls for success”, but I think we can understand this later. Again, the important thing is that the
Assuming Poisson and Gamma distributions leads to a negative binomial distribution.
The key point is that the assumption of Poisson and Gamma distributions leads to a negative binomial distribution.
Again, the important thing to remember is that assuming a Poisson distribution and a gamma distribution leads to a negative binomial distribution, which we will prove in 1.6.
Checking the behavior with applications
To check the behavior of the negative binomial distribution, you can use the following application.
1-4. Summary of Poisson and Binomial Distributions
In 1.4, the subject is summarized as “Summary of Poisson and Binomial Distributions”, and the important points are, again, as follows.
- The mechanism is the same whether a consumer chooses a certain category or a certain brand. In other words, the problem of which category a consumer chooses and the problem of which brand a consumer chooses can be solved using the same probability distribution.
- The distribution of individual consumer purchases is “Poisson distributed.
- As a result of the above two points, the distribution of the number of purchases of all consumers in a certain period of time follows a “negative binomial distribution.
The causal relationship between cause and effect is different in the textbook. In the textbook, it is written that the gamma distribution is derived from the result of the Poisson distribution and the negative binomial distribution, but in the lower part of the same page, it is written that the negative binomial distribution is derived from the Poisson distribution and the gamma distribution, so I will take the position of that understanding here.
Notation of gamma function
In general, the mathematical expression for the gamma distribution is given by using the parameters $\alpha, \beta$, which determine the shape
$$ f\left(x|\alpha, \beta \right) =\frac{\beta^{\alpha}x^{\alpha - 1}e^{-\beta x}}{\Gamma\left(\alpha \right)}\cdot\cdot\cdot\left(1\right) $$
This is expressed as Also, as $\displaystyle \beta =\frac{1}{\theta} $
$$ f\left(x|\alpha, \theta \right) =\frac{x^{\alpha - 1}e^{-frac{x}{\theta}}}{\Gamma\left(\alpha \right)}\theta^{\alpha} \cdot \cdot \cdot \left(2\right) $$
These two formulas are also used in wikipedia. Here, the mean and standard deviation of the probability distribution in (1) and (2) are as follows
Gamma distribution | $\displaystyle E[x]$ | $\displaystyle V[x]$ |
---|---|---|
$\displaystyle \frac{\beta^{\alpha}x^{\alpha - 1}e^{-\beta x}}{\Gamma\left(\alpha \right)} $ | $\frac{\alpha}{\beta}$ | $\frac{\alpha}{\beta^2}$ |
$\displaystyle $\frac{x^{\alpha - 1}e^{-\frac{x}{\theta}}}{\Gamma\left(\alpha\right)}\theta^{\alpha}$ | $\displaystyle \alpha\theta$ | $\displaystyle \alpha\theta^2$ |
This document does not explicitly state which notation $\displaystyle Gamma \left(K,\frac{M}{K}\right)$ uses, but $\displaystyle Gamma \left(1,5\right)$, $\displaystyle Gamma \left(3,\frac{5}{3}\right)$, and $\displaystyle Gamma \left(15,\frac{5}{15}\right)$ all state that the average value is 5, so we can assume that (2) is used.
python code
Here is the python code for writing the gamma distribution, using scipy, numpy, matplotlib and other familiar libraries for machine learning as modules.
python code for gamma distribution
Here is the result of running the above code.
- $\displaystyle \left(K,\frac{M}{K}\right) = \left(1,5\right), \left(3,\frac{5}{3}\right), \left(15,\frac{5}{15}\right) $. The figure on p. 61 of the textbook. We get a graph similar to Figure 2-2 on page 61 of the textbook
from scipy.stats import gamma
x = np.linspace(0,50,1000)
a = 1.0
b = 5.0
mean, var, skew, kurt = gamma.stats(a, scale=b, moments='mvsk')
y1 = gamma.pdf(x, a, scale=b)
print('a : {}, b : {:,.3f}, mean : {}'.format(a,b,mean))
a = 3.0
b = 5.0/3.0
mean, var, skew, kurt = gamma.stats(a, scale=b, moments='mvsk')
print('a : {}, b : {:,.3f}, mean : {}'.format(a,b,mean))
y2 = gamma.pdf(x, a, scale=b)
a = 15.0
b = 1.0/3.0
mean, var, skew, kurt = gamma.stats(a, scale=b, moments='mvsk')
print('a : {}, b : {:,.3f}, mean : {}'.format(a,b,mean))
y3 = gamma.pdf(x, a, scale=b)
plt.grid()
plt.ylim([-0.01,0.40])
plt.xlim([0,15])
plt.plot(x, y1, x, y2, x, y3)
plt.show()
a : 1.0, b : 5.000, mean : 5.0
a : 3.0, b : 1.667, mean : 5.0
a : 15.0, b : 0.333, mean : 5.0
Checking the behavior in the application
You can check the behavior of the gamma distribution at the following site.
1-5. The key formula that governs sales
The key equation in this chapter is Table 9-5, which shows how the average number of purchases $M$ and the parameter $K$ that determines the distribution are determined by which “category” or “brand” the consumer as a whole chooses according to a negative binomial distribution. The equation for “category” and the equation for “brand” are expressed in almost the same way. The only difference is the subscripts.
The point that I think is important for understanding this book is the middle part of p. 258, where the parameter $K$ is explained.
- The parameter $\left(k_j \right) $ is the number of balls that add up to the bag $\displaystyle \left(\theta_j \right) $ of red balls that were there at the beginning, one at a time, divided by $\displaystyle \left(d_i \right) $ of $\theta_j$. function. As the variance equation shows, the more red balls (more preferences), the greater the variance, and the more people it spreads to, as the derivative of the penetration rate $\theta$ shows. The higher the preference in the market structure, the more people it will spread to.
In other words, $K$ is a function of the preference, and it is explained that it is a value that increases as the preference increases.
The negative binomial distribution is
$$ P\left(r \right) = \frac{\left(1 + \frac{M}{K}\right)^{-K}\cdot\Gamma\left(K + r \right)}{\Gamma\left(r + 1 \right)} \cdot \Gamma \left(K \right) \cdot \left(\frac{M}{M+K}\right)^r \cdots \left(1 \right) $$
If $M$ is a preference and $K$ is a function of preferences, then $P\left(r \right)$ is a function with $M$ as its only parameter. This is the conclusion that the authors make in this book.
1-6. The Delishley NBD Model
What is the Delishley NBD model?
The Delishley NBD model is a probability distribution that tells us the relationship between brands within a category. Table 1-4 on page 31 is a concrete example of how this distribution can be used in this book. It specifically shows how the parameters are calculated for the given equation (Equation 1, described below).
The contents of this chapter are at a fairly advanced level. I will try to read and understand it little by little.
Self-annotations
It would be rude for me to annotate this book, but I think readers at my level might find it a bit confusing, so I’ll add a few notes.
About Dirichlet
The author refers to “Dirichlet” as Dirichlet, but I think it is more commonly used as Dirichlet. In the field of statistics, Dirichlet distribution is also called Dirichlet distribution, and in the field of numerical computation, Dirichlet problem is also called Dirichlet problem in boundary value problems, so you should be careful not to confuse them. I also usually say Dirichlet, but in the following, I will use Dirichlet.
Summary
Let’s start with the conclusion. We conclude that the Delishley NBD can be expressed by the following formula.
And the assumption that the formula makes is as follows:
- each consumer’s purchase behavior is an independent event
- purchase behavior is randomly generated
- Each purchaser $\left(C_i \right)$ has an average long-term purchase frequency $\mu_i$ for a given category. The number of category purchases in unit time for each purchaser $\left(C_i \right)$, $R_i$, is Poisson distributed. $$R_i \sim Poisson\left(\mu_i \right)$$
- The long-term average number of category purchases $\left(\mu \right)$ differs across consumers and is gamma distributed. $$\mu\sim Gamma\left(K,\frac{M}{K}\right)$$
- The number of purchases $\left(r_j \right)$ for each brand in period $T$ follows a gamma distribution $Gamma\left(\alpha_j,\beta \right)$. The $\alpha$ differs across brands, but the $\beta$ is identical across brands. This process on the number of purchases results in the probability $p$ of choosing a brand being Delishley distributed. Essentially, if we apply the assumptions on categories (1) through (4) to the assumptions on brands, the number of purchases for each brand will have a negative binomial distribution. Therefore, this assumption is equivalent to approximating the NBD with the gamma distribution. Personally, I think that the assumption of gamma distribution is an important point, although brands would also be NBD if the theory of this category is applied directly.
- Each consumer has a certain purchase probability for each brand, and the brand purchase $\left(r \right)$ follows a multinomial distribution. The purchase probability $\left(p \right)$ of a brand at the time of each category purchase is fixed for each brand in the long run. However, which one is chosen at the time of category purchase is random.
- The average number of purchases in each category by different people and the probability of people choosing each brand are independent of each other. In other words, it does not happen that people with a particular number of category purchases have a particular probability of purchasing a particular brand.
Meaning of Equation (17)
Equation (17) is a summary of Assumptions 1 through 7. In the following, Equation (17) from the textbook is denoted as Equation (17’) with some changes.
Equation (17’) consists of two integrals.
and
The following is an explanation of what each expression means. The meaning of each expression is explained below.
About Part 1
As explained earlier, this is the probability distribution, NBD, for selecting a category.
- The number of purchases at the individual level follows a Poisson distribution with the average number of purchases $\mu$ as a parameter
- The parameter $\mu$ of the Poisson distribution follows a gamma distribution with $\displaystyle \left(K, \frac{M}{K}\right)$ as the parameter when viewed across consumers.
Integrating the product of the Poisson and Gamma distributions with respect to $\mu$, we get the overall consumer probability by number of purchases in the category.
About Part 2
The distribution is used to find the probability of each brand being chosen.
- The number of times each brand is purchased, $(r_j)$, follows a multinomial distribution with the probability of that brand being chosen, $(p_j)$, as a parameter
- The probability that a brand is chosen $(p_j)$ follows a Delishley distribution with parameter $(\alpha)$
Detailed calculations for part 1 and part 2 will be done separately below.
Also, I have placed below a diagram of my own interpretation of the model from which equation (17) is generated.
Part 1
(a) Gamma distribution and the identity of $S$
Qualitative understanding of gamma distribution
According to the textbook, the gamma distribution is a distribution in which the occurrence of a probability further increases that probability. To be honest, I can’t understand it quickly with my current knowledge, but perhaps I can understand it from the result of the red and white balls when I derived the negative binomial distribution. The negative binomial distribution is derived from the mathematical expression of the process of adding more red balls to a bag of red and white balls, while returning the red balls to the bag. In other words, if you draw a red ball, the probability of drawing the next red ball is increasing.
As will be shown next, the negative binomial distribution can also be derived from the mixture distribution of Poisson distribution and Gamma distribution. As a result, it can be understood that the gamma distribution is a distribution that increases the probability. (This is just a qualitative understanding and may be wrong.)
Basic Properties of the Gamma Distribution
The notation, mean and variance of the gamma distribution are shown.
$$ E[r] = \alpha \beta $$
$$ Var[r] = \alpha \beta^2 $$
Additivity (Regenerativity) of Gamma Distribution
The gamma distribution has a property called additivity (written as additivity in this book, but commonly referred to as reproducibility). If $R_1$ and $R_2$ are $$ r_i \sim Gmma(\alpha_i, \beta) $$
$$ r_j r_j \sim Gmma(\alpha_j, \beta) $$
$r_j$ When $r_i$ and $r_j$ arise according to the gamma distribution
$$ r_i + r_j \sim Gmma(\alpha_i + \alpha_j, \beta) $$
$r_i + r_j$is a property that arises according to
Following assumption (5), the number of purchases of a brand $R_i$ follows a gamma distribution. Therefore, the number of purchases of a category $R$ is $$R=\sum_{j=0}^gr_j$$, so $$ R \sim Gamma\left(\sum_{j=0}^g \alpha_j, \beta\right) $$ It becomes
Also. $$S= \sum_{j=0}^g\alpha_j$$ the expected value of $R$ will be $S\beta$ due to the properties of the gamma distribution. On the other hand, $R$ is the number of purchases in the category, and according to the negative binomial distribution, the expected value is $MT$. Therefore, $$S\beta=MT$$
Furthermore, the expected value of the number of purchases for an individual brand can be considered in the same way If we remove $\beta$ from the two equations, we get $$\alpha_j=S\times \frac{m_j}{M}$$. This is the meaning of $\alpha$, a parameter of the gamma distribution.
Meaning of S
From the properties of the gamma distribution, it becomes $$R\sim Gamma(S,\beta)$$. It also becomes $$R\sim NBD(K,MT)$$. From the above, I understand $S$ as follows.
(b) From Poisson and Gamma distributions to negative binomial distribution:.
Derive a negative binomial distribution from a mixture of Poisson and Gamma distributions. Assume that the number of purchases in an individual’s category follows a Poisson distribution and that the mean number of purchases in the Poisson distribution $\mu$ follows a Gamma distribution.
$$ Gamma\left(\mu|K, \frac{M}{K}\right) = \frac{\mu^{K-1}}{\Gamma(K)\cdot\left(\frac{M}{K}\right)^K}e^{-\mu\frac{K}{M}} $$
From this notation, we can integrate over $\mu$. Calculate the expected value of the number of purchases.
Now, if the gamma function is $$ \Gamma\left(r \right) = \int_0^{\infty}t^{r-1}e^{-t}dt $$
Bearing in mind that the function can be defined as $$ \mu \left(t +\frac{K}{M} \right) = t \rightarrow \mu = \frac{M}{MT + K } t $$
and the variable transformation
From the above, we get
$$ P\left(R \right) = \left(1 + \frac{MT}{K} \right)^{-K}\frac{\Gamma\left(K + R \right)}{R! \cdot \Gamma\left(K \right)} \cdot \left(\frac{MT}{MT+K} \right)^R $$
which becomes Equation (21). It also agrees with the following equation shown in the chapter on the negative binomial distribution in 1-3. From the above, we have
We can now derive that.
Part 2
(c) From Gamma distribution to Derishley distribution
From the assumption that the number of purchases of each brand, $r_1,\cdots,r_g$, are independent and follow a gamma distribution, we can derive the distribution equation for the probability of purchase of each brand. The function of the number of times a brand is purchased is as follows from the assumption
We will perform a variable conversion of $r_j$ in this equation to the probability $p_j$ that the brand $j$ is chosen. The conditions for performing the transformation are as follows. The textbook explains it as a projective transformation from D to F.
The textbook says $0 < p_g < \infty$, but I think that’s probably a typo. As a result, equation (22) is derived.
$$ Dirichlet\left(p|\alpha \right) = \frac{\Gamma\left(\displaystyle\sum_{j=1}^{g}\alpha_j\right)}{\displaystyle \prod_{j=1}^{g}\Gamma\left(\alpha_j\right)} \left( \prod_{j=1}^{g-1}p_j^{\alpha_j-1}\right)\left(1-\sum_{j=1}^{g-1}p_j \right)^{\alpha_g-1} $$
I think this probability distribution is commonly referred to as the Dirichlet distribution. This equation is another expression for the constraint
However, if
$$ \sum_{j=1}^{g}p_j = 1 $$
I think it is more common to write it as
It’s a little complicated, but if you follow the equation carefully, I think you can derive it. Here are some points to keep in mind when deriving equation (22).
Note 1: Constraint conditions for variable transformation
The $r_1, \cdots, r_j$ are sampled from the gamma distribution by independent trials. When transforming these variables into probabilities $p_1, \cdots, p_g$, $p_j$ has the constraint $\sum_j p_j=1$. The reason for $1-\sum_{j=1}^{g-1}$ in the middle of the derivation of Equation (22) is to reflect this constraint in the equation.
Note 2: Calculating the Jacobian
In the bottom equation of P267
You will see the expression $|J|$. This is called the Jacobian. Variable transformations of probability distributions need to take this quantity into account in addition to simply swapping variables. If there is only one variable, it is easy, but if there are multiple variables to be transformed, you need to do the following calculations.
Here is a simple explanation of what the Jacobian means. For example, let $p(x,y)$ represent a continuous probability distribution with variables $x,y$ as random variables. Although we are not usually aware of it, a continuous probability distribution is actually meaningless by itself. For example, the result $\displaystyle p(2,3)=\frac{1}{3}$ implicitly means $\displaystyle p(2,3)dxdy=\frac{1}{3}$, which is the probability that $x,y$ exists in the range $x=2 + dx, y=3+dy$. This means that the probability of $x,y$ existing in the range $x=2 + dx, y=3+dy$ Therefore, this $dxdy$ is necessary for the interpretation of probability distributions, and the Jacobian is necessary when transforming $dxdy$ into variables. The exact mathematical expression requires knowledge of Lebesgue integrals and measurement theory, but I think it is unnecessary in the marketing world. However, it is necessary to have this knowledge when following mathematical formulas closely.
This is just my opinion based on my experience, but when you get to the practical level, you realize the importance of understanding this area. When you try to make the computer calculate integrals, if you don’t take this $dxdy$ into account, you will get incomprehensible numbers, the model will not make sense, and you will not know what you are doing.
Example of Jacobian calculation
Let’s try to calculate the Jacobian using the commonly used Gaussian integral as an example. Take for example the transformation of a variable to polar coordinates, $x=r\cos \theta, y=r\sin \theta$
So we have
From this
and we can derive the famous formula for Gaussian integrals. There is no explanation of this area, so if you don’t have much knowledge of mathematics, probability and statistics, it may be tough to understand it completely.
Calculating Matrix Equations
The Jacobian needs to calculate $\displaystyle \frac{\partial r_1}{\partial p_1} $, which can be calculated from $\displaystyle p_j=\frac{r_j}{r_1+r_2 + \cdots + r_g}$. A simple calculation yields the following
Properties of Matrix Equation Computation
One of the characteristics of determinants is that they do not change when you add one row to another.
Let’s do some specific calculations in the case of a quadratic square matrix.
The formula is as follows
and then
The third order and higher can be calculated in the same way.
In P267, the second line from the bottom says “adding rows does not change”.
- Add row 1 to row g.
- Add the second line to line g
- $\cdots$
- Add line g-1 to line g
When row g is added to row 1, $p_{g’}$ and $-p_{g’}$ in the first column become zero. This zero makes it much easier to calculate the determinant.
The result is The transformation of the matrix equation in this area may be tough if you don’t have some knowledge of it.
(d) Combine polynomial distribution and Delishley distribution:
multinomial distribution
Suppose there are brands from $1 to $g$ and the probability of each being chosen is $p_1,p_2,\cdots,p_g$. If the number of times each brand is chosen is $r_1, r_2, \cdots, r_g$, the probability distribution that $r$ follows is a multinomial distribution. The multinomial distribution is a probability distribution that extends the binomial distribution to multiple variables.
However
The result is Transforming this, we get $p_g =1-\sum_{j=1}^{g-1} p_j$, and the textbook replaces the last $g$ term in $\prod$ with this value.
When doing this kind of tabular expression, it is important to understand that the variable is $r$ and $p,R$ is just a parameter.
Combining the multinomial and Delishley distributions
The number of times a brand is selected, $(r_1,r_2, \cdots, r_j)$, follows the multinomial distribution with the probability of the brand being selected as a parameter, and the probability follows the Delishley distribution. By multiplying the two together and integrating over $p$, we can obtain the probability distribution, $r_1, \cdots ,r_g$
Again, the Delishley distribution is as follows.
In this case, it is important to understand that the variable is $p$ and the parameter is $\alpha$.
which yields the notation for equation (25) in the textbook.
Transformation from (1) to (2)
I just substituted the polynomial distribution and the Delishley distribution.
Transformation from (2) to (3)
The part about $p$ is left in the integral sign, and the other stuff is taken out.
Transformation from (3) to (4)
Integration of the Delishley distribution
In order to apply the expression for the constant part
is forcibly created in the numerator and denominator.
Transformation from (4) to (5)
The integration of the Delishley distribution is performed and the integral part is eliminated.
Transformation from (5) to (6)
Simplify the expression by using $S$ and $R$.
Part 3
Delishley NBD model:
Now that we know the distribution of the number of purchases for a category from equation (21) in part 1 and have formulated the distribution of the number of purchases for each brand in equation (25) in part 2, we can take the product of these to derive the distribution of purchases by brand.
For $g=2$, the Delishley distribution is a beta distribution. In general, the multivariate version of the beta distribution is the Delishley distribution (Dirichlet distribution). For specific calculations using this formula, please refer to Endnote 2 .
Running the application
To check the behavior of Delishley’s NBD model, you can visit the following website
Summary
As for 1-6, in addition to basic probability distributions such as Poisson distribution and negative binomial distribution, you need to have knowledge of a series of mathematics such as gamma distribution, beta distribution, variable transformation of continuous probability distribution, calculation of Jacobian, properties of determinant, polynomial distribution and Delishley distribution (Dirichlet distribution), properties of gamma function, etc. I think it will be difficult for you to understand. I would be happy if this helps the readers to understand.
The middle part of p. 267: “$p_j\times p_{g’}$.” It took me quite a while to follow the equation, thinking that the circled “$p_j\times p_{g’}$” was a circle with a symbol representing a composite mapping. In fact, it was a circle at the end of a Japanese sentence. Japanese full-width circles are complicated…