Numpy personal tips
numpy is one of the essential tools for data analysis and numerical computation. It is a library that is always needed when implementing machine learning, etc. I’ll leave a memo as a personal reminder. For details, please refer to the following official page.
Contents
- 1. basic operations <= here and now
- 2. trigonometric functions
- 3. exponential and logarithmic
- 4. statistical functions
- 5. linear algebra
- 6. Sampling
- 7. Miscellaneous
github
- The file in jupyter notebook format on github is here .
Author’s environment
The author’s environment and import method are as follows.
!sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G95
Python -V
Python 3.5.5 :: Anaconda, Inc.
import numpy as np
np.__version__
'1.18.1'
scalar, vector, matrix, tensor
- Scalar : 0th-order tensor
- Vectors : first-order tensors
- Matrices : second-order tensors
Getting the information
Information of type ndarray can be retrieved by specifying attribute values and built-in functions such as the following.
- len()
- Get the dimension length of the first element
- shape
- The size of each dimension (size)
- ndim
- dimension
- size
- Total number of elements
- itemsize
- memory size of elements
- nbytes
- number of bytes
- dtype
- type
- data
- memory address
- flags
- Memory information
Example usage is as follows
a = np.array([i for i in range(2)])
b = np.array([i for i in range(4)]).reshape(-1,2)
c = np.array([i for i in range(12)]).reshape(-1,2,2)
print('a : ', a)
print('len(a) : ', len(a))
print('a.shape : ', a.shape)
print('a.ndim : ', a.ndim)
print('a.size : ', a.size)
print('a.itemsize : ', a.itemsize)
print('a.nbytes : ', a.nbytes)
print('a.dtype : ', a.dtype)
print('a.data : ', a.data)
print('a.flgas : \n{}'.format(a.flags))
print()
print('b : \n{}'.format(b))
print('len(b) : ', len(b))
print('b.shape : ', b.shape)
print('b.ndim : ', b.ndim)
print('b.size : ', b.size)
print('b.itemsize : ', b.itemsize)
print('b.nbytes : ', b.nbytes)
print('b.dtype : ', b.dtype)
print('b.data : ', b.data)
print('b.flgas : \n{}'.format(b.flags))
print()
print('c : \n{}'.format(c))
print('len(c) : ', len(c))
print('c.shape : ', c.shape)
print('c.ndim : ', c.ndim)
print('c.size : ', c.size)
print('c.itemsize : ', c.itemsize)
print('c.nbytes : ', c.nbytes)
print('c.dtype : ', c.dtype)
print('c.data : ', c.data)
print('c.flgas : \n{}'.format(c.flags))
a : [0 1].
len(a) : 2
a.shape : (2,)
a.ndim : 1
a.size : 2
a.itemsize : 8
a.nbytes : 16
a.dtype : int64
a.data : <memory at 0x10a6c7d08
a.flgas :
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
b :
[[0 1]]
[2 3]]
len(b) : 2
b.shape : (2, 2)
b.ndim : 2
b.size : 4
b.itemsize : 8
b.nbytes : 32
b.dtype : int64
b.data : <memory at 0x10a6f73a8
b.flgas :
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
c :
[[ 0 1]]
[[ 2 3]]
[[ 4 5]]
[[ 6 7]]
[[ 8 9]]
[10 11]]]
len(c) : 3
c.shape : (3, 2, 2)
c.ndim : 3
c.size : 12
c.itemsize : 8
c.nbytes : 96
c.dtype : int64
c.data : <memory at 0x109a18138
c.flgas :
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
About flags
Flags can return a variety of information. This section describes how to store the memory of variables.
- [https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flags.html](https://docs.scipy.org/doc/numpy/reference/ generated/numpy.ndarray.flags.html)
As you can see from the official page of the link, there are two ways to allocate memory for arrays. One is C_CONTIGUOUS, and the other is F_CONTIGUOUS. C_ means the C language method, and F_ means the FORTRAN method.
In the C language method, $$ \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) $$.
More than memory, the variable
$$. a,c,b,d $$ a,c,b,d
In the FORTRAN method, the order is
$$ a,b,c,d a,b,c,d $$ a,b,c,d
In the FORTRAN method, it is stored in the order $$ a,b,c,d $$. In the FORTRAN method, it is stored in the order $$ a,b,c,d $$. We don’t usually think about this, but I will write it down as a reminder.
numpy data types
The actual numeric calculation part of numpy is implemented in C language. Therefore, when defining data, we can specify the data type. This information allows us to optimize the amount of data to be allocated in memory. The more familiar you are with large scale numerical calculations, the more important this property becomes.
The original site has many data types defined, but not many of them are actually used.
notation1 | notation2 | notation3 | data type | explanation |
---|---|---|---|---|
np.bool | - | ? | bool | boolean |
np.int8 | int8 | i1 | int8 | 8-bit signed integer |
np.int16 | int16 | i2 | int16 | 16-bit signed integer |
np.int32 | int32 | i4 | int32 | 32-bit signed integer |
np.int64 | int64 | i8 | int64 | 64-bit signed integer |
np.uint8 | uint8 | u1 | uint8 | 8-bit unsigned integer |
np.uint16 | uint16 | u2 | uint16 | 16-bit unsigned integer |
np.uint32 | uint32 | u4 | uint32 | 32-bit unsigned integer |
np.uint64 | uint64 | u8 | uint64 | 64-bit unsigned integer |
np.float16 | float16 | f2 | float16 | half precision floating point |
np.float16 | float16 | f2 | float16 | single precision floating point type |
np.float64 | float64 | f8 | float64 | double precision floating point type |
np.float128 | float128 | f16 | float128 | 4 double precision floating point type |
Notation 1, 2, and 3 are the same in terms of definitions.
a = np.array([i for i in range(5)], dtype=np.int8)
b = np.array([i for i in range(5)], dtype='int8')
c = np.array([i for i in range(5) ], dtype='i1')
print(a.dtype)
print(b.dtype)
print(c.dtype)
d = np.array(True, dtype='?')
e = np.array(True, dtype=np.bool)
print(d.dtype)
print(e.dtype)
int8
int8
int8
bool
bool
axis
numpy can use higher-order tensors, and when calculating statistics such as averages and sums, you can specify in which direction to calculate. To specify the direction, you can use the option “axis”.
axis direction
Example
In this example, we will use the ``axis’’ option to specify the axis direction.
a = np.arange(10)
print('\n####### for vectors #######')
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\n####### for matrices #######')
a = np.arange(10).reshape(2,5)
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\nnp.mean(a, axis=1) : ')
print(np.mean(a, axis=1))
print('\n####### for a third-order tensor #######')
a = np.arange(24).reshape(2,3,4)
print('\na : ')
print(a)
print('\nnp.mean(a) : ')
print(np.mean(a))
print('\nnp.mean(a, axis=0) : ')
print(np.mean(a, axis=0))
print('\nnp.mean(a, axis=1) : ')
print(np.mean(a, axis=1))
print('\nnp.mean(a, axis=2) : ')
print(np.mean(a, axis=2))
####### For vectors #######
a :
[0 1 2 3 4 5 6 7 8 9]
np.mean(a) :
4.5
np.mean(a, axis=0) :
4.5
####### For matrices #######
a :
[[0 1 2 3 4]]
[5 6 7 8 9]]
np.mean(a) :
4.5
np.mean(a, axis=0) :
[2.5 3.5 4.5 5.5 6.5]]
np.mean(a, axis=1) :
[2. 7.]
####### For a third-order tensor #######
a :
[[ 0 1 2 3]]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]]
[16 17 18 19]]
[20 21 22 23]]]
np.mean(a) :
11.5
np.mean(a, axis=0) :
[[ 6. 7. 8. 9.]]
[10. 11. 12. 13.]
[14. 15. 16. 17.]]
np.mean(a, axis=1) :
[[ 4. 5. 6. 7.]]
[16. 17. 18. 19.]]
np.mean(a, axis=2) :
[[ 1.5 5.5 9.5]]
[13.5 17.5 21.5]]
Broadcast
In numpy, when a scalar operation is performed on a matrix or vector and a scalar quantity, the scalar quantity operation is performed on all elements of the matrix or vector. If you are not familiar with it at first, you may misunderstand it, so let’s keep it in mind. You can see that the scalar quantity $a$ is calculated on all components of the vector $b$.
a = 10
b = np.array([1, 2])
print('a : ',a)
print('b : ',b)
print('a + b : ',a + b)
print('a * b : ',a * b)
print('b / a : ',b / a)
a : 10
b : [1 2].
a + b : [11 12].
a * b : [10 20].
b / a : [0.1 0.2]
Slicing
Slicing is a method to slice and dice a specific number out of a variable defined in ndarray format. It is very useful and should be learned by all means.
a = np.arange(12).reshape(-1,3)
print('a : \n{}'.format(a))
print()
print('a.shape : ',a.shape)
print()
print('a[0,1] : ', a[0,1], '## elements with row=1, col=1')
print()
print('a[2,2] : ', a[2,2], '## elements with row=2, col=2')
print()
print('a[1] : ', a[1], '## element with row=1')
print()
print('a[-1] : ', a[-1], '## element of last row')
print()
print('Elements from row 2 to row 3, column 1 to column 2')
print('a[1:3,0:2] : \n{}'.format(a[1:3,0:2]))
print()
print('All columns, all elements from the first column to every other column')
print('a[:,::2] : \n{}'.format(a[:,::2]))
print()
print('All elements from the first line to every other line')
print('a[::2] : \n{}'.format(a[::2]))
print()
print('All elements from the second line to every other line')
print('a[1::2] : \n{}'.format(a[1::2]))
print()
a :
[[ 0 1 2]]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
a.shape : (4, 3)
a[0,1] : 1 ## element with row=1, col=1
a[2,2] : 8 ## element with row=2, col=2
a[1] : [3 4 5] ## element with row=1
a[-1] : [ 9 10 11] ## element of last row
Elements of rows 2 to 3 and columns 1 to 2
a[1:3,0:2] :
[[3 4]]
[6 7]]
All columns, all elements from the first row to every other row
a[:,::2] :
[[ 0 2]]
[ 3 5]
[ 6 8]
[ 9 11]]
All elements from the first line to every other line
a[::2] :
[[0 1 2]]
[6 7 8]]
All elements every other line from the second line
a[1::2] :
[[ 3 4 5]]
[ 9 10 11]]
all, any, where
- all:return true if all of the elements are true
- any:return true if at least one of the elements is true
a = np.array([[0,1],[1,1]])
print(a.all())
print(a.any())
False
True
Returns the index of the element that satisfies the where condition.
a = np.array([[0,2],[1,1]])
print(np.where(a>1)) ## Return (0,1), the index of 2 greater than 1.
(array([0]), array([1]))
(0,1) is the index that fits the where condition.
Ternary operators in where
The where can be used to use ternary operators. If the first condition is satisfied, it takes the second argument; if not, it takes the third argument element. This form of where is used frequently.
a = np.array([2 *i +1 for i in range(6)]).reshape(2,3)
print('a : ', a)
print('Keep elements greater than 6 and set them to 0 if they are smaller')
np.where(a>6,a,0)
a : [[ 1 3 5]]
[ 7 9 11]]
Leave elements greater than 6 as they are, and set them to 0 if they are smaller
array([[ 0, 0, 0],
[ 7, 9, 11]])
a = np.array([2 *i +1 for i in range(6)]).reshape(2,3)
b = np.zeros((2,3))
print(a)
print(b)
print('If an element of a is divisible by 3, return the corresponding value of b, otherwise return the value of a')
np.where(a%3==0, b, a)
[[ 1 3 5]]
[[ 7 9 11]]
[[0. 0. 0. 0.]]
[0. 0. 0. 0.]]
If an element of a is divisible by 3, return the corresponding value of b, otherwise return the value of a
array([[ 1., 0., 5.]],
[ 7., 0., 11.]])
Fundamental constants.
Base of the natural logarithm
np.e
2.718281828459045
Pi
np.pi
3.141592653589793
Basic arithmetic operations
np.add(x,y)
Element-wise addition. This is a general vector addition method.
a = np.array([1.,2.])
b = np.array([4.,3.])
np.add(a,b)
array([5., 5.])
np.reciprocal(x)
The per-element reciprocal.
b = np.array([4.,3.])
np.reciprocal(b)
array([0.25 , 0.333333333])
An interesting thing I noticed about this function is that in python3, even integer divisors are computed to the decimal point; in python2, only the integer part is printed. In python2, only the integer part is displayed. However, when I use this function to calculate the reciprocal of an integer, only the integer part is displayed. However, if you specify that the data type is a floating fraction, the function will calculate to the decimal point.
# print(1/8) # => returns 0.125 @python3
# print(1/8) # => returns 0 @python2
print(np.reciprocal(8))
print(np.reciprocal(8, dtype='float16'))
print(np.reciprocal(8.))
0
0.125
0.125
np.multiply(x,y)
Multiplication of each element. It is called the adamantine product. It is different from the inner product of vectors.
a = np.array([1.,2.])
b = np.array([4.,3.])
np.multiply(a,b)
array([4., 6.])
np.divide(x,y)
Find the quotient of per-element division.
a = np.array([1.,2.])
b = np.array([4.,3.])
np.divide(b,a)
array([4. , 1.5])
np.mod(x,y)
Find the divisor’s remainder for each element.
a = np.array([3.,2.])
b = np.array([11.,3.])
print(np.mod(b,a))
[2. 1.]
np.divmod(x,y)
Find the quotient and remainder of per-element division simultaneously.
a = np.array([3.,2.])
b = np.array([11.,3.])
print(np.divmod(b,a))
(array([3., 1.]), array([2., 1.]))
np.power(x,y)
This is a power calculation. If you specify a vector, it will calculate the exponents of the vectors.
$2^3=8$
np.power(2,3)
8
$4^1$ and $3^2$
a = np.array([1.,2.])
b = np.array([4.,3.])
np.power(b,a)
array([4., 9.])
np.subtract(x,y)
Element-wise subtraction.
a = np.array([1.,2.])
b = np.array([4.,3.])
np.subtract(b,a)
array([3., 1.])