Numpy personal tips

numpy is one of the essential tools for data analysis and numerical computation. It is a library that is always needed when implementing machine learning, etc. I’ll leave a memo as a personal reminder. For details, please refer to the following official page.

Contents

github

  • The file in jupyter notebook format on github is here.

Author’s environment

The author’s environment and import method are as follows.

!sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G95
Python -V
Python 3.5.5 :: Anaconda, Inc.
import numpy as np

np.__version__
'1.18.1'

scalar, vector, matrix, tensor

  • Scalar : 0th-order tensor
  • Vectors : first-order tensors
  • Matrices : second-order tensors

Getting the information

Information of type ndarray can be retrieved by specifying attribute values and built-in functions such as the following.

  • len()
    • Get the dimension length of the first element
  • shape
    • The size of each dimension (size)
  • ndim
    • dimension
  • size
    • Total number of elements
  • itemsize
    • memory size of elements
  • nbytes
    • number of bytes
  • dtype
    • type
  • data
    • memory address
  • flags
    • Memory information

Example usage is as follows

a = np.array([i for i in range(2)])
b = np.array([i for i in range(4)]).reshape(-1,2)
c = np.array([i for i in range(12)]).reshape(-1,2,2)

print('a : ', a)
print('len(a) : ', len(a))
print('a.shape : ', a.shape)
print('a.ndim : ', a.ndim)
print('a.size : ', a.size)
print('a.itemsize : ', a.itemsize)
print('a.nbytes : ', a.nbytes)
print('a.dtype : ', a.dtype)
print('a.data : ', a.data)
print('a.flgas : \n{}'.format(a.flags))
print()
print('b : \n{}'.format(b))
print('len(b) : ', len(b))
print('b.shape : ', b.shape)
print('b.ndim : ', b.ndim)
print('b.size : ', b.size)
print('b.itemsize : ', b.itemsize)
print('b.nbytes : ', b.nbytes)
print('b.dtype : ', b.dtype)
print('b.data : ', b.data)
print('b.flgas : \n{}'.format(b.flags))
print()
print('c : \n{}'.format(c))
print('len(c) : ', len(c))
print('c.shape : ', c.shape)
print('c.ndim : ', c.ndim)
print('c.size : ', c.size)
print('c.itemsize : ', c.itemsize)
print('c.nbytes : ', c.nbytes)
print('c.dtype : ', c.dtype)
print('c.data : ', c.data)
print('c.flgas : \n{}'.format(c.flags))
a : [0 1].
len(a) : 2
a.shape : (2,)
a.ndim : 1
a.size : 2
a.itemsize : 8
a.nbytes : 16
a.dtype : int64
a.data : <memory at 0x10a6c7d08
a.flgas :
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


b :
[[0 1]]
 [2 3]]
len(b) : 2
b.shape : (2, 2)
b.ndim : 2
b.size : 4
b.itemsize : 8
b.nbytes : 32
b.dtype : int64
b.data : <memory at 0x10a6f73a8
b.flgas :
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


c :
[[ 0 1]]
  [[ 2 3]]

 [[ 4 5]]
  [[ 6 7]]

 [[ 8 9]]
  [10 11]]]
len(c) : 3
c.shape : (3, 2, 2)
c.ndim : 3
c.size : 12
c.itemsize : 8
c.nbytes : 96
c.dtype : int64
c.data : <memory at 0x109a18138
c.flgas :
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

About flags

Flags can return a variety of information. This section describes how to store the memory of variables.

As you can see from the official page of the link, there are two ways to allocate memory for arrays. One is C_CONTIGUOUS, and the other is F_CONTIGUOUS. C_ means the C language method, and F_ means the FORTRAN method.

In the C language method, $$ \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) $$.

More than memory, the variable

$$. a,c,b,d $$ a,c,b,d

In the FORTRAN method, the order is

$$ a,b,c,d a,b,c,d $$ a,b,c,d

In the FORTRAN method, it is stored in the order $$ a,b,c,d $$. In the FORTRAN method, it is stored in the order $$ a,b,c,d $$. We don’t usually think about this, but I will write it down as a reminder.

numpy data types

The actual numeric calculation part of numpy is implemented in C language. Therefore, when defining data, we can specify the data type. This information allows us to optimize the amount of data to be allocated in memory. The more familiar you are with large scale numerical calculations, the more important this property becomes.

The original site has many data types defined, but not many of them are actually used.

notation1notation2notation3data typeexplanation
np.bool-?boolboolean
np.int8int8i1int88-bit signed integer
np.int16int16i2int1616-bit signed integer
np.int32int32i4int3232-bit signed integer
np.int64int64i8int6464-bit signed integer
np.uint8uint8u1uint88-bit unsigned integer
np.uint16uint16u2uint1616-bit unsigned integer
np.uint32uint32u4uint3232-bit unsigned integer
np.uint64uint64u8uint6464-bit unsigned integer
np.float16float16f2float16half precision floating point
np.float16float16f2float16single precision floating point type
np.float64float64f8float64double precision floating point type
np.float128float128f16float1284 double precision floating point type

Notation 1, 2, and 3 are the same in terms of definitions.

a = np.array([i for i in range(5)], dtype=np.int8)
b = np.array([i for i in range(5)], dtype='int8')
c = np.array([i for i in range(5) ], dtype='i1')

print(a.dtype)
print(b.dtype)
print(c.dtype)

d = np.array(True, dtype='?')
e = np.array(True, dtype=np.bool)

print(d.dtype)
print(e.dtype)
int8
int8
int8
bool
bool

axis

numpy can use higher-order tensors, and when calculating statistics such as averages and sums, you can specify in which direction to calculate. To specify the direction, you can use the option “axis”.

axis direction

Example

In this example, we will use the ``axis’’ option to specify the axis direction.

a = np.arange(10)

print('\n####### for vectors #######')
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))

print('\n####### for matrices #######')
a = np.arange(10).reshape(2,5)
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\nnp.mean(a, axis=1) : ')
print(np.mean(a, axis=1))

print('\n####### for a third-order tensor #######')
a = np.arange(24).reshape(2,3,4)
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\nnp.mean(a, axis=1) : ')
print(np.mean(a, axis=1))
print('\nnp.mean(a, axis=2) : ')
print(np.mean(a, axis=2))
####### For vectors #######

a :
[0 1 2 3 4 5 6 7 8 9]

np.mean(a) :
4.5

np.mean(a, axis=0) :
4.5

####### For matrices #######

a :
[[0 1 2 3 4]]
 [5 6 7 8 9]]

np.mean(a) :
4.5

np.mean(a, axis=0) :
[2.5 3.5 4.5 5.5 6.5]]

np.mean(a, axis=1) :
[2. 7.]

####### For a third-order tensor #######

a :
[[ 0 1 2 3]]
  [ 4 5 6 7]
  [ 8 9 10 11]]

 [[12 13 14 15]]
  [16 17 18 19]]
  [20 21 22 23]]]

np.mean(a) :
11.5

np.mean(a, axis=0) :
[[ 6. 7. 8. 9.]]
 [10. 11. 12. 13.]
 [14. 15. 16. 17.]]

np.mean(a, axis=1) :
[[ 4. 5. 6. 7.]]
 [16. 17. 18. 19.]]

np.mean(a, axis=2) :
[[ 1.5 5.5 9.5]]
 [13.5 17.5 21.5]]

Broadcast

In numpy, when a scalar operation is performed on a matrix or vector and a scalar quantity, the scalar quantity operation is performed on all elements of the matrix or vector. If you are not familiar with it at first, you may misunderstand it, so let’s keep it in mind. You can see that the scalar quantity $a$ is calculated on all components of the vector $b$.

a = 10
b = np.array([1, 2])

print('a : ',a)
print('b : ',b)
print('a + b : ',a + b)
print('a * b : ',a * b)
print('b / a : ',b / a)
a : 10
b : [1 2].
a + b : [11 12].
a * b : [10 20].
b / a : [0.1 0.2]

Slicing

Slicing is a method to slice and dice a specific number out of a variable defined in ndarray format. It is very useful and should be learned by all means.

a = np.arange(12).reshape(-1,3)

print('a : \n{}'.format(a))
print()
print('a.shape : ',a.shape)
print()
print('a[0,1] : ', a[0,1], '## elements with row=1, col=1')
print()
print('a[2,2] : ', a[2,2], '## elements with row=2, col=2')
print()
print('a[1] : ', a[1], '## element with row=1')
print()
print('a[-1] : ', a[-1], '## element of last row')
print()
print('Elements from row 2 to row 3, column 1 to column 2')
print('a[1:3,0:2] : \n{}'.format(a[1:3,0:2]))
print()
print('All columns, all elements from the first column to every other column')
print('a[:,::2] : \n{}'.format(a[:,::2]))
print()
print('All elements from the first line to every other line')
print('a[::2] : \n{}'.format(a[::2]))
print()
print('All elements from the second line to every other line')
print('a[1::2] : \n{}'.format(a[1::2]))
print()
a :
[[ 0 1 2]]
 [ 3 4 5]
 [ 6 7 8]
 [ 9 10 11]]

a.shape : (4, 3)

a[0,1] : 1 ## element with row=1, col=1

a[2,2] : 8 ## element with row=2, col=2

a[1] : [3 4 5] ## element with row=1

a[-1] : [ 9 10 11] ## element of last row

Elements of rows 2 to 3 and columns 1 to 2
a[1:3,0:2] :
[[3 4]]
 [6 7]]

All columns, all elements from the first row to every other row
a[:,::2] :
[[ 0 2]]
 [ 3 5]
 [ 6 8]
 [ 9 11]]

All elements from the first line to every other line
a[::2] :
[[0 1 2]]
 [6 7 8]]

All elements every other line from the second line
a[1::2] :
[[ 3 4 5]]
 [ 9 10 11]]

all, any, where

  • all:return true if all of the elements are true
  • any:return true if at least one of the elements is true
a = np.array([[0,1],[1,1]])

print(a.all())
print(a.any())
False
True

Returns the index of the element that satisfies the where condition.

a = np.array([[0,2],[1,1]])

print(np.where(a>1)) ## Return (0,1), the index of 2 greater than 1.
(array([0]), array([1]))

(0,1) is the index that fits the where condition.

Ternary operators in where

The where can be used to use ternary operators. If the first condition is satisfied, it takes the second argument; if not, it takes the third argument element. This form of where is used frequently.

a = np.array([2 *i +1 for i in range(6)]).reshape(2,3)
print('a : ', a)
print('Keep elements greater than 6 and set them to 0 if they are smaller')
np.where(a>6,a,0)
a : [[ 1 3 5]]
 [ 7 9 11]]
Leave elements greater than 6 as they are, and set them to 0 if they are smaller





array([[ 0, 0, 0],
       [ 7, 9, 11]])
a = np.array([2 *i +1 for i in range(6)]).reshape(2,3)
b = np.zeros((2,3))

print(a)
print(b)
print('If an element of a is divisible by 3, return the corresponding value of b, otherwise return the value of a')
np.where(a%3==0, b, a)
[[ 1 3 5]]
 [[ 7 9 11]]
[[0. 0. 0. 0.]]
 [0. 0. 0. 0.]]
If an element of a is divisible by 3, return the corresponding value of b, otherwise return the value of a





array([[ 1., 0., 5.]],
       [ 7., 0., 11.]])

Fundamental constants.

Base of the natural logarithm

np.e
2.718281828459045

Pi

np.pi
3.141592653589793

Basic arithmetic operations

np.add(x,y)

Element-wise addition. This is a general vector addition method.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.add(a,b)
array([5., 5.])

np.reciprocal(x)

The per-element reciprocal.

b = np.array([4.,3.])
np.reciprocal(b)
array([0.25 , 0.333333333])

An interesting thing I noticed about this function is that in python3, even integer divisors are computed to the decimal point; in python2, only the integer part is printed. In python2, only the integer part is displayed. However, when I use this function to calculate the reciprocal of an integer, only the integer part is displayed. However, if you specify that the data type is a floating fraction, the function will calculate to the decimal point.

# print(1/8) # => returns 0.125 @python3
# print(1/8) # => returns 0 @python2
print(np.reciprocal(8))
print(np.reciprocal(8, dtype='float16'))
print(np.reciprocal(8.))
0
0.125
0.125

np.multiply(x,y)

Multiplication of each element. It is called the adamantine product. It is different from the inner product of vectors.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.multiply(a,b)
array([4., 6.])

np.divide(x,y)

Find the quotient of per-element division.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.divide(b,a)
array([4. , 1.5])

np.mod(x,y)

Find the divisor’s remainder for each element.

a = np.array([3.,2.])
b = np.array([11.,3.])
print(np.mod(b,a))
[2. 1.]

np.divmod(x,y)

Find the quotient and remainder of per-element division simultaneously.

a = np.array([3.,2.])
b = np.array([11.,3.])
print(np.divmod(b,a))
(array([3., 1.]), array([2., 1.]))

np.power(x,y)

This is a power calculation. If you specify a vector, it will calculate the exponents of the vectors.

$2^3=8$

np.power(2,3)
8

$4^1$ and $3^2$

a = np.array([1.,2.])
b = np.array([4.,3.])
np.power(b,a)
array([4., 9.])

np.subtract(x,y)

Element-wise subtraction.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.subtract(b,a)
array([3., 1.])