[numpy] 1. basic operations

Numpy personal tips

numpy is one of the essential tools for data analysis and numerical computation. It is a library that is always needed when implementing machine learning, etc. I’ll leave a memo as a personal reminder. For details, please refer to the following official page.

Official page

github

The file in jupyter notebook format on github is here .

Author’s environment

The author’s environment and import method are as follows.

!sw_vers

ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G95

Python -V

Python 3.5.5 :: Anaconda, Inc.

import numpy as np

np.__version__

'1.18.1'

scalar, vector, matrix, tensor

Scalar : 0th-order tensor
Vectors : first-order tensors
Matrices : second-order tensors

Getting the information

Information of type ndarray can be retrieved by specifying attribute values and built-in functions such as the following.

len()
- Get the dimension length of the first element
shape
- The size of each dimension (size)
ndim
- dimension
size
- Total number of elements
itemsize
- memory size of elements
nbytes
- number of bytes
dtype
- type
data
- memory address
flags
- Memory information

Example usage is as follows

a = np.array([i for i in range(2)])
b = np.array([i for i in range(4)]).reshape(-1,2)
c = np.array([i for i in range(12)]).reshape(-1,2,2)

print('a : ', a)
print('len(a) : ', len(a))
print('a.shape : ', a.shape)
print('a.ndim : ', a.ndim)
print('a.size : ', a.size)
print('a.itemsize : ', a.itemsize)
print('a.nbytes : ', a.nbytes)
print('a.dtype : ', a.dtype)
print('a.data : ', a.data)
print('a.flgas : \n{}'.format(a.flags))
print()
print('b : \n{}'.format(b))
print('len(b) : ', len(b))
print('b.shape : ', b.shape)
print('b.ndim : ', b.ndim)
print('b.size : ', b.size)
print('b.itemsize : ', b.itemsize)
print('b.nbytes : ', b.nbytes)
print('b.dtype : ', b.dtype)
print('b.data : ', b.data)
print('b.flgas : \n{}'.format(b.flags))
print()
print('c : \n{}'.format(c))
print('len(c) : ', len(c))
print('c.shape : ', c.shape)
print('c.ndim : ', c.ndim)
print('c.size : ', c.size)
print('c.itemsize : ', c.itemsize)
print('c.nbytes : ', c.nbytes)
print('c.dtype : ', c.dtype)
print('c.data : ', c.data)
print('c.flgas : \n{}'.format(c.flags))

a : [0 1].
len(a) : 2
a.shape : (2,)
a.ndim : 1
a.size : 2
a.itemsize : 8
a.nbytes : 16
a.dtype : int64
a.data : <memory at 0x10a6c7d08
a.flgas :
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


b :
[[0 1]]
 [2 3]]
len(b) : 2
b.shape : (2, 2)
b.ndim : 2
b.size : 4
b.itemsize : 8
b.nbytes : 32
b.dtype : int64
b.data : <memory at 0x10a6f73a8
b.flgas :
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


c :
[[ 0 1]]
  [[ 2 3]]

 [[ 4 5]]
  [[ 6 7]]

 [[ 8 9]]
  [10 11]]]
len(c) : 3
c.shape : (3, 2, 2)
c.ndim : 3
c.size : 12
c.itemsize : 8
c.nbytes : 96
c.dtype : int64
c.data : <memory at 0x109a18138
c.flgas :
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

About flags

Flags can return a variety of information. This section describes how to store the memory of variables.

[https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flags.html](https://docs.scipy.org/doc/numpy/reference/ generated/numpy.ndarray.flags.html)

As you can see from the official page of the link, there are two ways to allocate memory for arrays. One is C_CONTIGUOUS, and the other is F_CONTIGUOUS. C_ means the C language method, and F_ means the FORTRAN method.

In the C language method, $$ \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) $$.

More than memory, the variable

$$. a,c,b,d $$ a,c,b,d

In the FORTRAN method, the order is

$$ a,b,c,d a,b,c,d $$ a,b,c,d

In the FORTRAN method, it is stored in the order $$ a,b,c,d $$. In the FORTRAN method, it is stored in the order $$ a,b,c,d $$. We don’t usually think about this, but I will write it down as a reminder.

numpy data types

The actual numeric calculation part of numpy is implemented in C language. Therefore, when defining data, we can specify the data type. This information allows us to optimize the amount of data to be allocated in memory. The more familiar you are with large scale numerical calculations, the more important this property becomes.

The original site has many data types defined, but not many of them are actually used.

notation1	notation2	notation3	data type	explanation
np.bool	-	?	bool	boolean
np.int8	int8	i1	int8	8-bit signed integer
np.int16	int16	i2	int16	16-bit signed integer
np.int32	int32	i4	int32	32-bit signed integer
np.int64	int64	i8	int64	64-bit signed integer
np.uint8	uint8	u1	uint8	8-bit unsigned integer
np.uint16	uint16	u2	uint16	16-bit unsigned integer
np.uint32	uint32	u4	uint32	32-bit unsigned integer
np.uint64	uint64	u8	uint64	64-bit unsigned integer
np.float16	float16	f2	float16	half precision floating point
np.float16	float16	f2	float16	single precision floating point type
np.float64	float64	f8	float64	double precision floating point type
np.float128	float128	f16	float128	4 double precision floating point type

Notation 1, 2, and 3 are the same in terms of definitions.

a = np.array([i for i in range(5)], dtype=np.int8)
b = np.array([i for i in range(5)], dtype='int8')
c = np.array([i for i in range(5) ], dtype='i1')

print(a.dtype)
print(b.dtype)
print(c.dtype)

d = np.array(True, dtype='?')
e = np.array(True, dtype=np.bool)

print(d.dtype)
print(e.dtype)

int8
int8
int8
bool
bool

axis

numpy can use higher-order tensors, and when calculating statistics such as averages and sums, you can specify in which direction to calculate. To specify the direction, you can use the option “axis”.

axis direction

Example

In this example, we will use the ``axis’’ option to specify the axis direction.

a = np.arange(10)

print('\n####### for vectors #######')
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))

print('\n####### for matrices #######')
a = np.arange(10).reshape(2,5)
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\nnp.mean(a, axis=1) : ')
print(np.mean(a, axis=1))

print('\n####### for a third-order tensor #######')
a = np.arange(24).reshape(2,3,4)
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\nnp.mean(a, axis=1) : ')
print(np.mean(a, axis=1))
print('\nnp.mean(a, axis=2) : ')
print(np.mean(a, axis=2))

####### For vectors #######

a :
[0 1 2 3 4 5 6 7 8 9]

np.mean(a) :
4.5

np.mean(a, axis=0) :
4.5

####### For matrices #######

a :
[[0 1 2 3 4]]
 [5 6 7 8 9]]

np.mean(a) :
4.5

np.mean(a, axis=0) :
[2.5 3.5 4.5 5.5 6.5]]

np.mean(a, axis=1) :
[2. 7.]

####### For a third-order tensor #######

a :
[[ 0 1 2 3]]
  [ 4 5 6 7]
  [ 8 9 10 11]]

 [[12 13 14 15]]
  [16 17 18 19]]
  [20 21 22 23]]]

np.mean(a) :
11.5

np.mean(a, axis=0) :
[[ 6. 7. 8. 9.]]
 [10. 11. 12. 13.]
 [14. 15. 16. 17.]]

np.mean(a, axis=1) :
[[ 4. 5. 6. 7.]]
 [16. 17. 18. 19.]]

np.mean(a, axis=2) :
[[ 1.5 5.5 9.5]]
 [13.5 17.5 21.5]]

Broadcast

In numpy, when a scalar operation is performed on a matrix or vector and a scalar quantity, the scalar quantity operation is performed on all elements of the matrix or vector. If you are not familiar with it at first, you may misunderstand it, so let’s keep it in mind. You can see that the scalar quantity $a$ is calculated on all components of the vector $b$.

a = 10
b = np.array([1, 2])

print('a : ',a)
print('b : ',b)
print('a + b : ',a + b)
print('a * b : ',a * b)
print('b / a : ',b / a)

a : 10
b : [1 2].
a + b : [11 12].
a * b : [10 20].
b / a : [0.1 0.2]

Slicing

Slicing is a method to slice and dice a specific number out of a variable defined in ndarray format. It is very useful and should be learned by all means.

a = np.arange(12).reshape(-1,3)

print('a : \n{}'.format(a))
print()
print('a.shape : ',a.shape)
print()
print('a[0,1] : ', a[0,1], '## elements with row=1, col=1')
print()
print('a[2,2] : ', a[2,2], '## elements with row=2, col=2')
print()
print('a[1] : ', a[1], '## element with row=1')
print()
print('a[-1] : ', a[-1], '## element of last row')
print()
print('Elements from row 2 to row 3, column 1 to column 2')
print('a[1:3,0:2] : \n{}'.format(a[1:3,0:2]))
print()
print('All columns, all elements from the first column to every other column')
print('a[:,::2] : \n{}'.format(a[:,::2]))
print()
print('All elements from the first line to every other line')
print('a[::2] : \n{}'.format(a[::2]))
print()
print('All elements from the second line to every other line')
print('a[1::2] : \n{}'.format(a[1::2]))
print()

a :
[[ 0 1 2]]
 [ 3 4 5]
 [ 6 7 8]
 [ 9 10 11]]

a.shape : (4, 3)

a[0,1] : 1 ## element with row=1, col=1

a[2,2] : 8 ## element with row=2, col=2

a[1] : [3 4 5] ## element with row=1

a[-1] : [ 9 10 11] ## element of last row

Elements of rows 2 to 3 and columns 1 to 2
a[1:3,0:2] :
[[3 4]]
 [6 7]]

All columns, all elements from the first row to every other row
a[:,::2] :
[[ 0 2]]
 [ 3 5]
 [ 6 8]
 [ 9 11]]

All elements from the first line to every other line
a[::2] :
[[0 1 2]]
 [6 7 8]]

All elements every other line from the second line
a[1::2] :
[[ 3 4 5]]
 [ 9 10 11]]

all, any, where

all:return true if all of the elements are true
any:return true if at least one of the elements is true

a = np.array([[0,1],[1,1]])

print(a.all())
print(a.any())

False
True

Returns the index of the element that satisfies the where condition.

a = np.array([[0,2],[1,1]])

print(np.where(a>1)) ## Return (0,1), the index of 2 greater than 1.

(array([0]), array([1]))

(0,1) is the index that fits the where condition.

Ternary operators in where

The where can be used to use ternary operators. If the first condition is satisfied, it takes the second argument; if not, it takes the third argument element. This form of where is used frequently.

a = np.array([2 *i +1 for i in range(6)]).reshape(2,3)
print('a : ', a)
print('Keep elements greater than 6 and set them to 0 if they are smaller')
np.where(a>6,a,0)

a : [[ 1 3 5]]
 [ 7 9 11]]
Leave elements greater than 6 as they are, and set them to 0 if they are smaller





array([[ 0, 0, 0],
       [ 7, 9, 11]])

a = np.array([2 *i +1 for i in range(6)]).reshape(2,3)
b = np.zeros((2,3))

print(a)
print(b)
print('If an element of a is divisible by 3, return the corresponding value of b, otherwise return the value of a')
np.where(a%3==0, b, a)

[[ 1 3 5]]
 [[ 7 9 11]]
[[0. 0. 0. 0.]]
 [0. 0. 0. 0.]]
If an element of a is divisible by 3, return the corresponding value of b, otherwise return the value of a





array([[ 1., 0., 5.]],
       [ 7., 0., 11.]])

Fundamental constants.

Base of the natural logarithm

np.e

2.718281828459045

Pi

np.pi

3.141592653589793

Basic arithmetic operations

np.add(x,y)

Element-wise addition. This is a general vector addition method.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.add(a,b)

array([5., 5.])

np.reciprocal(x)

The per-element reciprocal.

b = np.array([4.,3.])
np.reciprocal(b)

array([0.25 , 0.333333333])

An interesting thing I noticed about this function is that in python3, even integer divisors are computed to the decimal point; in python2, only the integer part is printed. In python2, only the integer part is displayed. However, when I use this function to calculate the reciprocal of an integer, only the integer part is displayed. However, if you specify that the data type is a floating fraction, the function will calculate to the decimal point.

# print(1/8) # => returns 0.125 @python3
# print(1/8) # => returns 0 @python2
print(np.reciprocal(8))
print(np.reciprocal(8, dtype='float16'))
print(np.reciprocal(8.))

0
0.125
0.125

np.multiply(x,y)

Multiplication of each element. It is called the adamantine product. It is different from the inner product of vectors.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.multiply(a,b)

array([4., 6.])

np.divide(x,y)

Find the quotient of per-element division.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.divide(b,a)

array([4. , 1.5])

np.mod(x,y)

Find the divisor’s remainder for each element.

a = np.array([3.,2.])
b = np.array([11.,3.])
print(np.mod(b,a))

[2. 1.]

np.divmod(x,y)

Find the quotient and remainder of per-element division simultaneously.

a = np.array([3.,2.])
b = np.array([11.,3.])
print(np.divmod(b,a))

(array([3., 1.]), array([2., 1.]))

np.power(x,y)

This is a power calculation. If you specify a vector, it will calculate the exponents of the vectors.

$2^3=8$

np.power(2,3)

$4^1$ and $3^2$

a = np.array([1.,2.])
b = np.array([4.,3.])
np.power(b,a)

array([4., 9.])

np.subtract(x,y)

Element-wise subtraction.

a = np.array([1.,2.])
b = np.array([4.,3.])
np.subtract(b,a)

array([3., 1.])