[numpy] 4. statistical functions

Numpy personal tips

numpy is one of the essential tools for data analysis and numerical computation. It is a library that is always needed when implementing machine learning, etc. I’ll leave a memo as a personal reminder. For details, please refer to the following official page.

Official page

github

The file in jupyter notebook format on github is here .

Author’s environment

The author’s environment and import method are as follows.

!sw_vers

ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G2022

Python -V

Python 3.7.3

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import numpy as np

np.__version__

'1.16.2'

Get statistics.

np.max(x)

Returns the maximum value of an array.

Define $a$ as a second-order tensor.

a = np.array([.
    [1,8,3],
    [6,5,4],
    [7,2,9].
  ]
)

Define $b$ as a third-order tensor.

b = np.array([
  [
    [1,8,3],
    [6,5,4],
    [7,2,9]]
  ],
  [
    [1,9,4],
    [7,2,5],
    [6,8,3]
  ]
])

print('-' * 20)
print('a : \n',a)
print()
print('np.max(a) : \n',np.max(a))
print()
print('np.max(a, axis=0) : \n',np.max(a, axis=0))
print()
print('np.max(a, axis=1) : \n',np.max(a, axis=1))
print()

print('-' * 20)
print('b : \n',b)
print()
print('np.max(b) : \n',np.max(b))
print()
print('np.max(b, axis=0) : \n',np.max(b, axis=0))

print()
print('np.max(b, axis=1) : \n',np.max(b, axis=1))

print()
print('np.max(b, axis=2 : \n',np.max(b, axis=2))

--------------------
a :
 [[1 8 3]]
 [6 5 4]]
 [7 2 9]]

np.max(a) :
 9

np.max(a, axis=0) :
 [7 8 9]]

np.max(a, axis=1) :
 [8 6 9]

--------------------
b :
 [[[1 8 3]]
  [6 5 4]]
  [7 2 9]]

 [[1 9 4]]
  [7 2 5]]
  [6 8 3]]]

np.max(b) :
 9

np.max(b, axis=0) :
 [[1 9 4]]
 [7 5 5]]
 [7 8 9]]

np.max(b, axis=1) :
 [[7 8 9]]
 [7 9 5]]

np.max(b, axis=2 :
 [[8 6 9]]
 [9 7 8]]

print('-' * 20)
print('a : \n',a)
print()
print('np.argmax(a) : \n',np.argmax(a))
print()
print('np.argmax(a, axis=0) : \n',np.argmax(a, axis=0))
print()
print('np.argmax(a, axis=1) : \n',np.argmax(a, axis=1))
print()

print('-' * 20)
print('b : \n',b)
print()
print('np.argmax(b) : \n',np.argmax(b))
print()
print('np.argmax(b, axis=0) : \n',np.argmax(b, axis=0))

print()
print('np.argmax(b, axis=1) : \n',np.argmax(b, axis=1))

print()
print('np.argmax(b, axis=2 : \n',np.argmax(b, axis=2))

--------------------
a :
 [[1 8 3]]
 [6 5 4]]
 [7 2 9]]

np.argmax(a) :
 8

np.argmax(a, axis=0) :
 [2 0 2]]

np.argmax(a, axis=1) :
 [1 0 2]

--------------------
b :
 [[[1 8 3]]
  [6 5 4]]
  [7 2 9]]

 [[1 9 4]]
  [7 2 5]]
  [6 8 3]]]

np.argmax(b) :
 8

np.argmax(b, axis=0) :
 [[0 1 1]]
 [1 0 1]]
 [0 1 0]]

np.argmax(b, axis=1) :
 [[2 0 2]]
 [1 0 1]]

np.argmax(b, axis=2 :
 [[1 0 2]]
 [1 0 1]]

np.argmax(x)

Returns the position of the largest value in the array.

a = np.random.randint(100,size=10)

print('a : ',a)
print('max position : ',np.argmax(a))

a : [53 35 94 2 3 14 21 55 17 6].
max position : 2

np.min(x)

Returns the minimum value of the array.

a = np.random.randint(100,size=10)

print('a : ',a)
print('min : ',np.min(a))

a : [36 42 6 71 92 23 44 92 36 79].
min : 6

np.argmax(x)

Returns the position of the minimum array value.

a = np.random.randint(100,size=10)

print('a : ',a)
print('min position : ',np.argmin(a))

a : [51 76 59 12 28 50 21 61 49 37].
min position : 3

np.maximum(x,y)

Compare the two arrays and create a new ndarray by selecting the larger value.

a = np.random.randint(100,size=10)
b = np.random.randint(100,size=10)

print('a : ',a)
print('b : ',b)
print('max : ',np.maximum(a,b))

a : [25 78 95 45 79 33 72 33 38 81].
b : [41 91 64 7 60 54 29 25 99 88]]
max : [41 91 95 45 79 54 72 33 99 88]]

np.minimum(x,y)

Compares two arrays, selects the smaller value and creates a new ndarray.

a = np.random.randint(100,size=10)
b = np.random.randint(100,size=10)

print('a : ',a)
print('b : ',b)
print('min : ',np.minimum(a,b))

a : [80 81 40 80 47 81 17 86 91 63].
b : [84 51 7 4 62 66 83 85 21 66]]
min : [80 51 7 4 47 66 17 85 21 63]]

np.sum(a, axis=None, dtype=None, out=None, keepdims=[no value], initial=[no value], where=[no value])

a = np.arange(10)
np.sum(a)

Try to calculate with axis.

a = np.arange(12).reshape(3,4)

print('a : ')
print(a)
print('sum axis=0 : ', np.sum(a, axis=0))
print('sum axis=1 : ', np.sum(a, axis=1))

a :
[[ 0 1 2 3]]
 [ 4 5 6 7]
 [ 8 9 10 11]]
sum axis=0 : [12 15 18 21]]
sum axis=1 : [ 6 22 38]]

np.average(a, axis=None, weights=None, returned=False)

Find the average. You can also get a weighted average.

It is simply the average of the array.

a = np.arange(10)
np.average(a)

4.5

The average with axis.

a = np.arange(12).reshape(3,4)

print('a : ', a)
print('average axis = 0 : ',np.average(a, axis=0))
print('average axis = 1 : ',np.average(a, axis=1))

a : [[ 0 1 2 3]]
 [ 4 5 6 7]
 [ 8 9 10 11]]
average axis = 0 : [4. 5. 6. 7.]]
average axis = 1 : [1.5 5.5 9.5].

Specifies the weights.

a = np.arange(5)

# Set the weights as desired
w = np.array([0.1,0.2,0.5,0.15,0.05])

np.average(a,weights=w)

1.7619047619047616

np.mean(a, axis=None, dtype=None, out=None, keepdims=[no value])

Find the average. It is not possible to obtain a weighted average here. However, you can specify the type of the calculation.

x = np.arange(10)
np.mean(x)

4.5

Compute with an integer type.

x = np.arange(10)
np.mean(x, dtype='int8')

array([4], dtype=int8)

np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=[no value])

Find the standard deviation.

x = np.arange(10)
np.std(x)

2.8722813232690143

np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=[no value])

Find the variance.

x = np.arange(10)
np.var(x)

8.25

np.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)

x = np.arange(10)
print(x)
print('median x : ',np.median(x))
print()

x = np.arange(11)
print(x)
print('median x : ',np.median(x))

[0 1 2 3 4 5 6 7 8 9]
median x : 4.5

[ 0 1 2 3 4 5 6 7 8 9 10]
median x : 5.0

np.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)

Find the sample variance with bias=True. Additional arrays can be specified by y.

a = np.random.randint(10,size=9).reshape(3,3)
b = np.arange(3)

print('a : ')
print(a)
print()

print('Covariance matrix with unbiased variance')
print(np.cov(a))
print()

print('Covariance matrix with sample variance')
print(np.cov(a, bias=True))
print()

print('Sample variance for each component : match diagonal components of covariance matrix')
print('var a[0] = ', np.var(a[0]))
print('var a[1] = ', np.var(a[1]))
print('var a[2] = ', np.var(a[2]))
print()

print('Add b')
print('b : ')
print(b)
print(np.cov(a,b, bias=True))

a :
[[2 2 1]]
 [0 1 6]]
 [0 9 3]]

Covariance matrix with unbiased variance
[[ 0.333333333 -1.8333333333 0.5 ]]
 [ -1.83333333 10.33333333 -0.5 ]
 [ 0.5 -0.5 21.]]

Covariance matrix with sample variance
[[ 0.22222222 -1.22222222 0.333333333]
 [-1.2222222222 6.888888889 -0.333333333]
 [ 0.333333333 -0.3333333 14.]]

Sample variance of each component : Match the diagonal components of the covariance matrix
var a[0] = 0.2222222222222222
var a[1] = 6.888888888888888888
var a[2] = 14.0
Add b
b :
[0 1 2]
[[ 0.22222222 -1.22222222 0.333333333 -0.333333333]
 [-1.22222222 6.888888889 -0.3333333 2.]
 [ 0.333333333 -0.3333333 -0.3333333 14. 1. ]
 [-0.333333333 2. 1. 0.666666667]]

np.corrcoef(x, y=None, rowvar=True, bias=[no value], ddof=[no value])

a = np.random.randint(10,size=9).reshape(3,3)
np.corrcoef(a)

array([ 1. , 0.24019223, -0.75592895],
       [ 0.24019223, 1. , -0.81705717],
       [-0.75592895, -0.81705717, 1.]])

[numpy] 4. statistical functions

Numpy personal tips

Contents

github

Author’s environment

Get statistics.

np.max(x)

np.argmax(x)

np.min(x)

np.argmax(x)

np.maximum(x,y)

np.minimum(x,y)

np.sum(a, axis=None, dtype=None, out=None, keepdims=[no value], initial=[no value], where=[no value])

np.average(a, axis=None, weights=None, returned=False)

np.mean(a, axis=None, dtype=None, out=None, keepdims=[no value])

np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=[no value])

np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=[no value])

np.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)

np.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)

np.corrcoef(x, y=None, rowvar=True, bias=[no value], ddof=[no value])

Related Articles

Related Articles

3. exponential and logarithmic

[numpy] 2. trigonometric functions

[numpy] 3. exponential and logarithmic

[pandas] Expanding Each Element Stored as a List in pandas

[pandas] Using pandas to Set Each Element Stored as a List into Columns and Expand as One-Hot Encoding

[python] Applying filter after groupby in pandas

[python] Calculation of Standard Deviation in pandas and numpy