[Strategic Theory of Probability Thinking] 2. Mathematical Tools for Market Understanding and Forecasting

Preparation

In textbooks, calculations are mainly performed by Excel functions. Although Excel has an excellent GUI, it does not have enough API libraries to connect to external web systems or data analysis tools. Therefore, we will use Python to perform the same calculations as in the textbook. Here are the preparations for this.

github

The jupyter notebook file on github is here .

google colaboratory

If you want to run it on google colaboratory here

Author’s environment

This is the author’s environment.

The author's environment.

ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G2022

Python -V

Python 3.7.3

Load the required libraries.

import numpy as np
import scipy
from scipy.stats import binom

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

print("numpy version :", np.__version__)
print("matplotlib version :", matplotlib.__version__)
print("sns version :",sns.__version__)

numpy version : 1.16.2
matplotlib version : 3.0.3
sns version : 0.9.0

Overview

In the end-of-book commentary2, the authors explain six tools that they use frequently, with specific examples. You can refer to this and apply it when you need it in a real business situation.

Gamma-Poisson Least Sensitive Model

Based on the data of “When did you buy recently” and “When did you visit recently” (Recent Purchase Period: Recency), it tells us which brand, which facility, and which period we should focus our resources relatively. 2.

negative binomial distribution

This is a tool that corrects for the difference between the consumer household panel’s data on your brand and the actual sales figures. This means that it is very useful for benchmarking trial rates, repeat rates, and purchase frequency during forecasting.

category advancement rank model

It will tell you how much market share you can get in a newly created category. It also allows you to simulate the market share based on your marketing plan.

trial model/repeat model

Using data from concept tests, concept use tests, and household panel data, you can predict the sales of a new product in the first year of its launch.

VPP Model (Volume per Purchase)

Helps you determine the size of your product. 6.

delishley NBD model

It provides a concrete example of how Delishley S is calculated for NBD category K. It predicts the quarterly purchase rate, the number of quarterly purchases, and the percentage of 100% loyal customers for Colgate in Table 1-4 of the textbook, as explained in Explanation 1, 1-6.

2-1. Gamma-Poisson Least Sensitive Model

We can build an NBD model by calculating $m$ and $k$ from the data of “when did you buy recently” and “when did you visit recently”. the formula describing the NBD model can be calculated as follows.

$$ P\left(r \right) = \frac{\left(1 + \frac{M}{K} \right)^{-K} \cdot \Gamma\left(K + r \right)}{\Gamma\left(r + 1 \right)\cdot \Gamma\left(K \right)} \cdot \left(\frac{M}{M+K} \right)^r $$

Let $mt$ denote the corresponding average value of $M$ over the period $t$ for a given product, and let $K$ denote $k$. Since the penetration rate can be calculated by subtracting the number of people who never buy this product from 100%, we have

$$ \begin{aligned} P(t) &=1-P_0\left(r=0 \right) \\ &= 1 - \frac{\left(1 + \frac{mt}{k} \right)^{-k} \cdot \Gamma\left(k + 0 \right)}{\Gamma\left(0 + 1 \right)\cdot \Gamma\left(k \right)} \cdot \left(\frac{mt}{mt+k} \right)^0 \\ &= 1 - \left(1 + \frac{mt}{k} \right)^{-k} \end{aligned} $$

It becomes Thus, the penetration rate in a period $t$ and $t-1$ is

$$ P\left(t \right) - P\left(t-1 \right) = \left(1+\frac{m\times t}{k}\right)^{-k} - \left(1+\frac{m\times \left(t-1 \right)}{k}\right)^{-k} $$

This becomes

To apply this to an arbitrary period, we use two variables $t_1$ and $t_2$ and define $f \left(x \right)$ as follows.

$$ f\left(t_1,t_2,m,k \right) = \left(1+\frac{m\times t_1}{k} \right)^{-k} - \left(1+\frac{m\times t_2}{k} \right)^{-k} $$

Table 10-1 in the textbook can be expressed using the common function $f\left(x \right)$ as follows

Gamma distribution	Actual values
$\displaystyle f\left(t_1=0,t_2= \frac{14}{31}\right) $	43.9%
$\displaystyle f\left(t_1=\frac{14}{31},t_2=1 \right) $	25.6%
$\displaystyle f\left(t_1=1,t_2=2 \right) $	19.1%
$\displaystyle f\left(t_1=2,t_2=3 \right) $	5.1%
$\displaystyle f\left(t_1=3,t_2=4 \right) $	1.5%
$\displaystyle f\left(t_1=4,t_2=5 \right) $	0.7%
$\displaystyle f\left(t_1=5,t_2=6 \right) $	1.4%
$\displaystyle f\left(t_1=6,t_2=\infty \right) $	2.7%

Derivation of m,k by scipy’s curve_fig

In general, to perform least-squares fitting of nonlinear functions, we can use the curve_fit module of scipy . According to the scipy website

  scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, jac =None, **kwargs)[source].

In addition, $xdata$ and $ydata$ are

  xdata : array_like
  The independent variable where the data is measured. Must be an M-length sequence or an (k,M)-shaped array for functions with k predictors.

  ydata : array_like
  The dependent data, a length M array - nominally f(xdata, ...)

It is defined as To solve the fitting problem in the textbook.

$$ f\left(t_1,t_2,m,k \right) =\left(1+frac{m\times t_1}{k} \right)^{-k} -\left(1+frac{m\times t_2}{k} \right)^{-k} $$

For the function $\displaystyle f\left(x \right)$ defined above, we will be solving a fitting problem for a two-variable function, where the two variables specify the time period (to find the number of purchases in two weeks to a month, we use $\displaystyle t_1=\frac{ 14}{31}, t_2=1$) and are defined as a two-dimensional array as follows

x = np.array([.
  [0.0 ,14/31 ,1.0 ,2.0 ,3.0 ,4.0 ,5.0 ,6.0 ],
  [14/31 ,1.0 ,2.0 ,3.0 ,4.0 ,5.0 ,6.0 , 10000.0].
])

x[0] is the array of $t_1$, and x[1] is the array of $t_2$. x[1,7]=10000.0 is originally $\infty$, but infinity is not acceptable in actual numerical calculations, so 10000 is used, which is effectively infinity. This value can be as small as 100.

The actual code for fitting is as follows

import json
import numpy as np

from scipy.optimize import curve_fit
from scipy.special import gamma

def _get_delta_nbd(x, m, k):
  return (1 + m * x[0] / k )**(-k) - (1 + m * x[1] / k )**(-k)

x = np.array([.
  [0.0 ,14/31 ,1.0 ,2.0 ,3.0 ,4.0 ,5.0 ,6.0 ],
  [14/31 ,1.0 ,2.0 ,3.0 ,4.0 ,5.0 ,6.0 , 10000.0].
])

y = [0.439, 0.256, 0.191, 0.051, 0.015, 0.007, 0.014, 0.027])

parameters, covariances = curve_fit(_get_delta_nbd, x, y)
print('parameters : ', parameters)
print('covariances : ', covariances)

parameters : [1.37824241 4.14429889]
covariances : [[ 0.00284656 -0.03699629]]
 [-0.03699629 1.57449471]]

The resulting $m$ and $k$ are

$$ \begin{aligned} m&= 1.378 \\ k&= 4.144 \end{aligned} $$

and $m$ and $k$ used by the author.

$$ \begin{aligned} m&= 1.37552 \\ k&= 4.061 \end{aligned} $$

The value is almost equal to.

2-2. Negative binomial distribution

This section explains how to correct the panel data by using the difference between the actual sales data and the data obtained from the panel data.

First of all, the panel data gives us the following information

(A) : Number of households
(B) : Penetration rate
(C) : Average number of purchases
(D) : Average number of items purchased
(E) : Average purchase price

Please refer to Table 10-2 below for specific values. It is P281 in the textbook. From here, we can create a

Sales by Panel Data Sales = number of households x penetration rate x average number of purchases x average number of units purchased x average unit purchase price

The panel data sales can be obtained as follows.

In addition, we know the following results.

Sales amount

Using this ratio of panel data sales to actual sales, we can correct the panel data and various parameters.

To do this, the textbook makes a number of important assumptions. The following are some of the assumptions made in the textbook, which will help you to understand the subsequent calculations smoothly.

Assumptions

Actual current sales: 5.89 billion yen
Sales based on panel data: 4.12 billion yen (actual sales ratio: 70%, calculated by AxBxCxDxE)
Average number of items purchased per purchase is the same in reality as in the panel data
Average unit price per purchase is the same in reality as in the panel data
$K$ is the same in reality as in the panel data

Again, only “sales” are known as actual results. In the textbook example, we only know that the sales are 5.89 billion yen.

Table 10-2 Corrected panel data for a brand
	Item	Before correction	After correction
(A)	Total number of households in 2008 (thousands)	49973	49973
(B)	Penetration rate	15.0%	17.4%
(C)	Average number of purchases	2.50	3.07
(D)	Average number of items purchased per transaction	1.10	1.10
(E)	Average unit price per purchase	200 yen	200 yen
(F)	Percentage of customers who purchased two or more times	50%	55%
(G)	Annual Sales (AxBxCxDxE)	4.12 billion yen	5.89 billion yen
(H)	Ratio of G to actual	70%	100%

Table 10-3 Calculating the correction
	Item	Before correction	After correction
(I)	Brand $m$:(BxCxD)	0.4125	0.5893
(J)	Brand $k$	0.09899	0.09899
(K)	$P_0$(probability of never buying)	85.00%	82.53%
(L)	$P_1$(Probability of buying once)	6.79%	7.00%
(M)	$P_{+2}=100\%-P_0-P_1$	8.21%	10.47%
(N)	Percentage of buyers who purchased two or more times by model:$\left(\frac{M}{B}\right)$	54.76%	59.95%

Steps in the correction

brand’s $m=$penetration rate x average number of purchases x average number of units purchased
$k$ of brand

$$ P\left(r \right) = \frac{\left(1 + \frac{M}{K} \right)^{-K} \cdot \Gamma\left(K + r \right)}{\Gamma\left(r + 1 \right)\cdot \Gamma\left(K \right)} \cdot \left(\frac{M}{M+K} \right)^r $$

by substituting $\displaystyle K=k, M=m=0.4125, r=0$ into

$$ P_0=\frac{\left(1+\frac{m}{k} \right)^{-k}\cdot \Gamma\left(k+0 \right)}{\Gamma\left(0+1 \right)\cdot \Gamma\left(k \right)}=\left(1+\frac{0.4125}{k} \right)^{-k} =0.85 $$

We obtain the nonlinear equation Where $\displaystyle P_0$ is the probability of never having made a purchase, so it can be calculated from (1 - penetration) and

$$ P_0=1 - 015 = 0.85 $$

We use the fact that $$ P_0=1 - 015 = 0.85 $$

Solving nonlinear equations numerically

Equation to find $k$

$$ \left(1+\frac{0.4125}{k} \right)^{-k} =0.85 $$

is nonlinear and cannot be solved analytically. We will use numerical methods to solve it using a computer. Here, we will use Python’s newton method to obtain the solution. In the textbook, we use Excel to obtain the value of $k$, but either method is fine.

The result is $$ k=0.09899 $$ as a result.

python code

The python code to get $k$ is as follows

from scipy.optimize import newton

MIN_k = 0
MAX_k = 1.0

def check_k(k):
  if MIN_k < k and k < MAX_k:
    return True
  else:
    return False

def get_k(m, P0):

  def func(k, m=m, P0=P0):
    return (1 + m / k) ** (-1 * k) - P0

  k = None
  try:
    for initial_k in [(i + 1) * 0.01 for i in range(100)]:
      k = newton(func, initial_k)
      if check_k(k):
        return k
    else:
      if not check_k(k):
        return None
  except:
    return None

m = 0.4125
P0 = 0.85

print("k = {:,.5f}".format(get_k(m, P0)))

k = 0.09893

and the value is almost equal to the textbook even using python.

3. P_1, the probability of buying once

$P_1$ is the same as $P_0$.

$$ P\left(r \right) = \frac{\left(1 + \frac{m}{k} \right)^{-k} \cdot \Gamma\left(k + r \right)}{\Gamma\left(r + 1 \right)\cdot \Gamma\left(k \right)} \cdot \left(\frac{m}{m+k} \right)^r $$

Just substitute $\displaystyle k=0.09899, m=0.4125, r=1$ into However, since it contains a gamma function, calculations by python or excel are required.

$$ \begin{aligned} P_1&=P\left(1 \right) \\ &= \frac{\left(1 + \frac{0.4125}{0.09899} \right)^{-0.09899} \cdot \Gamma\left(0.09899 + 1 \right)}{\Gamma\left(1 + 1 \right)\cdot \Gamma\left(0.09899 \right)} \\ & \quad \quad \quad \times \left(\frac{0.4125}{0.4125+0.09899} \right)^1 \\ &= 0.0679 \end{aligned} $$

The following example shows how to do this

4. Probability of buying two or more times P_{2+}

Since we can subtract $P_0$ and $P_1$ from $1.

$$ P_{2+}=1-0.85-0.06709=0.0821 $$

The result is

5. Percentage of buyers who buy more than once according to the model (uncorrected)

This is simply a ratio.

$$ \frac{P_{2+}}{1-P_0} =\frac{0.0821}{1-0.85} =\frac{0.0821}{0.1500}=0.5476 $$ The result is.

Calculate the specific correction

6. Calculation of $P_0$

The $m$ is corrected by the ratio of actual sales to the sales on the panel data (0.7). Let $m$ be $m’$ after correction.

$$ m’=\frac{m}{0.7}=\frac{0.4125}{0.7}=0.5893 $$

and the correction is simple. From the prior assumption, $k$ is common to both panel data and actual data, so $k’=k=0.09899$. The $k’$ means $k$ after correction. Using this $k’$, $P_0$ is corrected as follows.

$$ P_0=\left(1+\frac{0.5893}{0.09899}\right)^{-0.09899}=0.8253 $$

Also, the corrected osmotic rate (defined as $\tau’$ and the uncorrected osmotic rate as $\tau$)

$$ \tau’=1-0.8253=0.1747 $$

which can be calculated as

7. Average number of purchases after correction

Since $m=$penetration×average number of purchases×average number of purchases, as obtained in 1.

$$ \begin{aligned} \text{average number of purchases after correction}&= \\\ \frac{m after correction}{penetration rate after correction\rm \times average number of purchases per time} &=\frac{m'}{\tau' \times 1.1} \\\ &= \frac{0.5893}{0.1747 \times 1.1}\\\ &=3.07 \end{aligned} $$

The following is an example of how to do this.

8. Percentage of purchasers who buy more than once

This is similarly just a matter of calculating the corrected $P_0$ and $P_1$. To calculate.

$$ P\left(r \right) = \frac{\left(1 + \frac{m^\prime}{k^\prime} \right)^{-k^\prime} \cdot \Gamma\left(k^\prime+ r \right)}{\Gamma\left(r + 1 \right)\cdot \Gamma\left(k^\prime \right)} \cdot \left(\frac{m^\prime}{m^\prime+k^\prime} \right)^r $$

using the following formula. If we add $’$ to each corrected value, we get

$$ \begin{aligned} P_{0}' &= 0.8253 \\ P_{1}' &= 0.0700 \\ P_{2+}' &=0.1047 \\ \end{aligned} $$

Thus.

$$ \frac{P_{2+}^\prime}{1-P_0^\prime} =\frac{0.1047}{1-0.8253} =0.5995 $$ It becomes

9. Ratio of buyers who purchased two or more times

This is just a correction to the values in the panel data by the ratio of two or more purchases by the model.

$$ \begin{aligned} &\text{ratio of buyers who purchased more than once} \\ &=\text{value of panel data} \\ &\quad \quad \times \frac{\text{percentage of buyers two or more times using the adjusted model}}{\text{percentage of buyers two or more times using the uncorrected model}} \\ &=0.5 \times \frac{0.5995}{0.5476}=0.5474 \end{aligned} $$

and it will be corrected as follows.

2-3. Category advancement order model

In this section, we will use the

The meaning of the model. Allows simulations on market share.

The formula is shown as follows. The categories of products that can be supported are

Fabric softener
Liquid detergent for clothes
Freeze dried
Coffee

Freeze-dried coffee.

Official

$$ \begin{aligned} &\text{ratio of market share to pioneer brands} &= \left(a\right)^{-0.49} \times \left(b\right)^{1.11} \times \left(c\right)^{0.28} \times \left(d\right)^{0.07} \end{aligned} $$

Here we have

a : entry order
b : relative favoritism
c : Ratio of publicity cost
d : number of years between It becomes.

Example

The textbook gives a specific example.

Pioneer brand (the brand with the highest market share): 35% share
Entry rank: 4
Relative favorability:0.9
Advertising Expenditure Rate: 0.7
Entry in the same year as the third product (intervening years):1

$$ \begin{aligned} &\text{predicted share} &=0.35 \times \left(\text{4}\right)^{-0.49} \times \left(\text{0.9}\right)^{1.11} \left(\text{0.7}\right)^{0.28} \times \left(\text{1}\right)^{0.07} right)^{0.07} = 0.14285 \end{aligned} $$

and the share will be 14%.

python code

Not really necessary, but here is the python calculation code.

pioneer_share = 0.35
order = 4
m = 0.9
cost = 0.7
entry = 1

prediceted_share = pioneer_share*order**(-0.49)*m**(1.11)*cost**(0.28)*entry**(0.07)

print('Predicted share ratio = {:,.3f}'.format(prediceted_share))

Predicted share ratio = 0.143

Aside from the issue of whether this is actually a correct prediction, the formula is quite meaningful in that it allows us to predict the share of a new market entry in the future.

2-4. Trial and Repeat Models

In this section, we will discuss

Concept testing
Concept Use Test
Household panel data

This section explains how to predict the first year sales of a new product based on the values of

a) Trial model, repeat model

Sales = annual sales from trial + annual sales from repeat customers

Definition.

Sales from trial = (Pop) x (Trial rate) x (Trial VPP)
Sales from repeat customers = (Pop) x (Trial rate) x (Repeat rate) x (Number of repeat customers) x (Repeat VPP)

b) Explanation of each item

Pop: Number of total consumers and total households
Trial rate: Percentage of Pop who purchased the target product for the first time in one year
Repeat Rate: Percentage of people who purchased the product for the first time in one year who purchased it again in one year.
Number of repeat purchases: Average number of purchases by repeat customers minus one (for trial)
Trial VPP: Average purchase price during the trial period
Repeat VPP: Average purchase price for repeat purchases

c) Example

Conditions

10% of all households purchased a certain new shampoo product in the first year after launch
30% of purchasers buy at least one more time within the period
Average number of repeat purchases is 2.5
The average purchase price for a trial is 383 yen (365 yen x 1.05)
Average purchase price for repeat customers is 475 yen (431 yen x 1.10)

Sales
= 49.97 million households x 10% x 383 yen + 49.97 million households x 10% x 30% x 1.5x475 yen
= 1.91 billion yen + 1.07 billion yen = 2.98 billion yen

This section would not be so difficult to understand if only the trial rate could be derived from the panel data.

2-5. VPP Model (Volume per Purchase)

This section is omitted as there is no need to explain the mathematical aspects of the model.

2-6. The Delishley NBD Model

As explained in 1-6 , the Delishley NBD model is a useful model to predict and analyze the purchase rate and number of purchases for all brands in a category from the inter-brand shares.

Based on Colgate’s purchase data in the U.K., this article describes in detail how to find the purchase rate, the percentage of 100% loyal customers, and even the average number of purchases, starting from the derivation of the key parameters K and S.

Calculation of K

Delishley’s NBD model is shown again as follows. This is the textbook equation (6).

$$ P(R,r_j) = p(r_j|R) \cdot p_R(NBD) $$

Here, we have

$$ p(r_j | R) = \frac{R!}{r_j!(R-r_j)!}\frac{\Gamma(S)}{\Gamma(\alpha_j)\Gamma(S-\alpha_j)}\frac{\Gamma(\alpha_j + r_j)\Gamma(S-\alpha_j + R -r_j)}{\Gamma(S+R)} $$

$$ p_R(NBD) = \left(1 + \frac{MT}{K}\right)^{-K}\frac{\Gamma(K+R)}{R!\Gamma(K)}\left(\frac{MT}{MT+K}\right) $$

The $K$ is calculated by the category.

The calculation of $K$ is derived from the equation of the categorical NBD model as in 2-2 . Although it is a nonlinear equation, the solution can be obtained numerically by using the newton method for the unknowns.

The term $p(r_j|R)$ in equation (6) becomes 1 because it is obtained from the percentage of households that have never purchased anything, which greatly simplifies the calculation. This eliminates $S$, and it is not necessary to know it at this point.

For corrugations

$$ \left( 1 + \frac{1.46}{K}\right)^{-K} = 0.44 \rightarrow K=0.78 $$

It appears that.

Calculating S

To find S, we use the data from Table 1-4 for households that have never bought corrugates (80%).

It’s a little complicated, but a household that has never bought any corrugates is defined as $R=0$ and $r=0$, i.e., a household that has never bought any toothpaste (category); $R=1$ and $r=0$, a household that has bought toothpaste once but has never bought any corrugates; $R=2$ and $r=0$, a household that has bought toothpaste twice but has never bought any corrugates. households that bought toothpaste twice but did not buy Colgate, $r=0$ for $R=3$, and $r=0$ for $R=3$, households that bought toothpaste three times but did not buy Colgate… All of these people must be counted.

Therefore, we have to solve the following equation.

$$ \displaystyle \sum_{R=0}^{\infty} p(r_j=0|R)p_R(NBD) = 0.8 $$

Ideally, some people would have bought toothpaste infinitely many times, but never bought any Colgate, which is what the formula represents.

In reality, however, there is no such data, and once R becomes large to some extent, it becomes zero after that (there are no infinite number of brands to begin with), so we need to censor it at a certain number. p289 limits it to 10.

$$ \displaystyle \sum_{R=0}^{10} p(r_j=0|R)p_R(NBD) = 0.8 \quad \cdots (\ast) $$

and this is sufficient for practical purposes.

Also, $p(r_j=0|R)$ is somewhat easier to write, and

$$ \begin{aligned} p(r_j=0 | R) &= \frac{\Gamma(S)}{\Gamma(S-\alpha_j)}\frac{\Gamma(S-\alpha_j + R)}{\Gamma(S+R)} \\ &=\frac{\Gamma(S)}{\Gamma(0.75S)}\frac{\Gamma(0.75 S + R)}{\Gamma(S+R)}\quad \because \alpha_j = 0.25S \end{aligned} $$

which is a function of $S$ and $R$, so the expression $(\ast)$ is a function of $S$ only.

However, the expression $(\ast)$ is quite complex. If you actually want to find it numerically, you need to make some assumptions about the initial values before finding it. As in the case of finding $K$, we can use the Newton method to find it. In the textbook, the final $S$ is obtained as $$S=1.2$$

About Table 10-9

Table 10-9 shows the results of the calculation of $p(r_j|R)$ when the values of $r_j$ and $R$ are given specifically. The $p(r_j=1|R=2)=20.5%$ is the ratio of households that bought toothpaste twice to households that bought Colgate once. However, since $p(r_j|R)$ is a conditional probability, we need to keep in mind that it is only 20% of the households that bought toothpaste twice.

About Table 10-10

Table 10-10 is a table of numbers where we can write $p_R(NBD)$ in Table 10-9. It is the probability that a household buys the toothpaste category and how many Colgate products they buy.

Percentage of 100% loyal customers of Colgate

The numbers on the diagonal in Table 10-10 are the percentage of buyers who decide to buy only Colgate toothpaste, so we can divide them by the percentage of toothpaste purchases to get the percentage of Colgate loyal customers.

Average number of purchases of Colgate

By multiplying the probability of purchase of Colgate by the number of purchases and summing them, we can calculate the expected value of the number of purchases (average number of purchases).

python code

Here is the python code we used to calculate in Table 10-8, 9, and 10. The derivation of the negative binomial and Delishley distributions is difficult, but the calculation of the results themselves is not very complicated.

import numpy as np
import math
from scipy.special import gamma

def get_nbd(M, T, K, R):
  return ((1 + M * T / K)**(-1 * K)) * \
         (gamma(K + R) / math.factorial(R) / gamma(K)) * \
         ((M * T / (M * T + K)) ** R)

def get_p_rj_0(r, a, S, R):
   return (math.factorial(R)/ math.factorial(r) / math.factorial(R - r)) * \
          (gamma(S) / gamma(a) / gamma(S - a)) * \
          (gamma(a + r) * gamma(S - a + R - r) / gamma(S + R)))

def print01():
  for R in range(0,11):
    print('R={} | '.format(R), end='')
    for r in range(R + 1):
      print('{:.3f} | '.format(round(get_p_rj_0(r=r, a=1.2 * 0.25, S=1.2, R=R), 3)), end='')
    print()

def print02():
  for R in range(0,11):
    print('R={} | '.format(R), end='')
    for r in range(R + 1):
      print('{:.1f} % | '.format(round(100 * get_nbd(M=1.46, T=1, K=0.78,R=R) * get_p_rj_0(r=r, a=1.2 * 0.25, S=1.2, R=R), 3)), end='')
    print()

print('Table 10-9 Percentage of category purchases by number of purchases when S=0.12')
print01()
print()
print()
print('Table 10-10 Percentage of category and brand purchases by number of purchases when S=0.12')
print02()

Table 10-9 Proportion of category by number of purchases when S=0.12
R=0 | 1.000 |
R=1 | 0.750 | 0.250 |
R=2 | 0.648 | 0.205 | 0.148 |
R=3 | 0.587 | 0.182 | 0.125 | 0.106 | R=4
R=4 | 0.545 | 0.168 | 0.113 | 0.091 | 0.083 |
R=5 | 0.514 | 0.157 | 0.105 | 0.083 | 0.072 | 0.069 |
R=6 | 0.489 | 0.149 | 0.099 | 0.078 | 0.066 | 0.060 | 0.059 |
R=7 | 0.468 | 0.143 | 0.094 | 0.074 | 0.062 | 0.055 | 0.052 | 0.052 |
R=8 | 0.451 | 0.137 | 0.090 | 0.070 | 0.059 | 0.052 | 0.048 | 0.045 | 0.046 |
R=9 | 0.437 | 0.132 | 0.087 | 0.068 | 0.057 | 0.050 | 0.045 | 0.042 | 0.040 | 0.041 |
R=10 | 0.424 | 0.128 | 0.084 | 0.066 | 0.055 | 0.048 | 0.043 | 0.040 | 0.038 | 0.037 | 0.038 |


Table 10-10 Ratios of category and brand by number of purchases when S=0.12
R=0 | 43.9 % |
R=1 | 16.7 % | 5.6 % |
R=2 | 8.4 % | 2.6 % | 1.9 % |
R=3 | 4.6 % | 1.4 % | 1.0 % | 0.8 % |
R=4 | 2.6 % | 0.8 % | 0.5 % | 0.4 % | 0.4 % |
R=5 | 1.5 % | 0.5 % | 0.3 % | 0.2 % | 0.2 % | 0.2 % | R=6
R=6 | 0.9 % | 0.3 % | 0.2 % | 0.1 % | 0.1 % | 0.1 % | 0.1
R=7 | 0.6 % | 0.2 % | 0.1 % | 0.1 % | 0.1 % | 0.1 % | 0.1 % | 0.1
R=8 | 0.3 % | 0.1 % | 0.1 % | 0.1 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0
R=9 | 0.2 % | 0.1 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 %
R=10 | 0.1 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 % | 0.0 %

Summary

The above is my attempt to break down the explanations at the end of “Strategic Thinking in Probability” in my own way. I am not a marketing expert, nor do I have any practical experience. I usually work in IT-related fields, such as web system development and machine learning model development. In that context, I usually use Poisson distribution and Gamma distribution, but I never thought that they are applied to marketing in this way.

At first, my friend who specializes in marketing told me “What is the negative binomial distribution? According to my friend, overseas companies such as P&G usually use mathematics in their marketing, but in Japan, it seems that there is still a long way to go. Dr. Arenberg, a great authority on marketing, published a paper that is the basis of this book many decades ago. However, I believe that probability and statistics will be applied more and more to marketing in Japan in the future as well, as a result of “Strategy Theory of Probability Thinking”.

[Strategic Theory of Probability Thinking] 2. Mathematical Tools for Market Understanding and Forecasting

Preparation

github

google colaboratory

Author’s environment

Overview

2-1. Gamma-Poisson Least Sensitive Model

Derivation of m,k by scipy’s curve_fig

2-2. Negative binomial distribution

Assumptions

Steps in the correction

Solving nonlinear equations numerically

python code

3. P_1, the probability of buying once

4. Probability of buying two or more times P_{2+}

5. Percentage of buyers who buy more than once according to the model (uncorrected)

Calculate the specific correction

6. Calculation of $P_0$

7. Average number of purchases after correction

8. Percentage of purchasers who buy more than once

9. Ratio of buyers who purchased two or more times

2-3. Category advancement order model

Official

Example

python code

2-4. Trial and Repeat Models

a) Trial model, repeat model

Definition.

b) Explanation of each item

c) Example

Conditions

2-5. VPP Model (Volume per Purchase)

2-6. The Delishley NBD Model

Calculation of K

Calculating S

About Table 10-9

About Table 10-10

Percentage of 100% loyal customers of Colgate

Average number of purchases of Colgate

python code

Summary

Related Articles