Notes on how to use matplotlib

This is a memo on how to use matplotlib. I’ll update it as I learn new ways to use it.

github

  • The file in jupyter notebook format is here

google colaboratory

  • If you want to run it in google colaboratory here matplotlib/mat_nb.ipynb)

Author’s environment

The author’s OS is macOS, and the options are different from Linux and Unix commands.

! sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G6020
Python -V
Python 3.7.3

Import the basic libraries and check their versions.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np

print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)
matplotlib version : 3.0.3
scipy version : 1.4.1
numpy version : 1.16.2

Write a simple graph.

Let’s write a function $y = \sin x$. There are various maniacal ways to use this function, but in my work in data analysis, I feel that I use it most often in this form. Setting up the grid and labels is important, and I think it depends on your environment whether you can use tex in matplotlib or not.

  • plt.grid()
  • plt.title()
  • plt.xlabel()
  • plt.ylabe()
  • plt.xlim()
  • plt.legend()
x = np.linspace(0,10,100)
y = np.sin(x)

plt.grid()
plt.title("sin function")
plt.xlabel("$x$")
plt.ylabel("$y = \\sin(x)$")
plt.xlim(0,8)
plt.ylim(-1.2,1.2)
plt.plot(x,y,label="$y=\\sin x$")

plt.legend()
<matplotlib.legend.Legend at 0x1175fc4a8>

Write multiple graphs

x = np.linspace(0,10,100)
y1 = np.sin(x)
y2 = 0.8 * np.cos(x)

plt.grid()
plt.title("multi function")
plt.xlabel("$x$")
plt.xlim(0,8)
plt.ylim(-1.2,1.2)
plt.plot(x,y1, label="$y = \\sin x$")
plt.plot(x,y2, label="$y = 0.8 \\times \cos x$")
plt.legend()
<matplotlib.legend.Legend at 0x1177909b0>

Change graph linetype and color

x = np.linspace(0,10,100)
y1 = np.sin(x)
y2 = 0.8 * np.cos(x)

plt.grid()
plt.title("multi function")
plt.xlabel("$x$")
plt.xlim(0,8)
plt.ylim(-1.2,1.2)
plt.plot(x, y1, "o", color="red", label="$y = \\\sin x$")
plt.plot(x, y2, "x", color="blue", label="$y = \\sin x$")
plt.legend()
<matplotlib.legend.Legend at 0x117774898>

Create a histogram

This is a histogram of 10000 samples from a normal distribution. This is synonymous with visualizing the density distribution.

x = np.random.randn(10000)

plt.hist(x)
print(np.random.rand())
0.3934044138335787

You can set up a bin, for example. You can set up a bin, etc. You’ll learn how to do this naturally after a few times.

x = np.random.randn(10000)
plt.hist(x, bins=20,color="red")
(array([ 14., 23., 63., 155., 267., 480., 677., 981., 1277,
        1287., 1249., 1162., 912., 645., 421., 222., 92., 42,
          29., 2.]),
 array([-3.30855493, -2.971971 , -2.63538708, -2.29880315, -1.96221922,
        -1.62563529, -1.28905136, -0.95246744, -0.61588351, -0.27929958,
         0.05728435, 0.39386828, 0.73045221, 1.06703613, 1.40362006,
         1.74020399, 2.07678792, 2.41337185, 2.74995577, 3.0865397 ,
         3.42312363]),
 <a list of 20 Patch objects>)

Draw a three-dimensional graph

Although it is not that frequent, let’s try to draw a three-dimensional graph. Data analysis has hundreds of dimensions, but the number of dimensions that we can understand with our human senses is barely three. To be honest, I have a hard time even with 3 dimensions.

We will use a module called mplot3d. Also, specific to 3D graphs is the use of a numpy function called meshgrid.

Normally, to plot the plane $ z = x + y$ in $xyz$ space, the number of elements in $x$ and $y$ is determined by $N(x) \times N(y)$. Normally, you need to create an array for this amount, but meshgrid will automatically create it for you.

Let’s look at an example.

x = np.array([i for i in range(5)])
y = np.array([i for i in range(5)])

print('x :', x)
print('y :', y)
print()
xx, yy = np.array(np.meshgrid(x, y))
print('xx :', xx)
print()
print('yy :', yy)
x : [0 1 2 3 4] y : [0 1 2 3 4
y : [0 1 2 3 4]]

xx : [[0 1 2 3 4]]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]]

[0 1 2 3 4]]
 [1 1 1 1 1]
 [2 2 2 2 2 2]
 [3 3 3 3 3 3]]
 [4 4 4 4 4]]

Where $xx$ is the $x$-coordinate of the 25 coordinates. Similarly, $yy$ is the $y$-coordinate of the 25 coordinates. Very useful.

from mpl_toolkits.mplot3d import Axes3D

# Create a two-variable function as appropriate
def get_y(x1,x2):
  return x1 + 3 * x2 * x2 + 1

x1 = np.linspace(0,10,20)
x2 = np.linspace(0,10,20)

X1, X2 = np.meshgrid(x1, x2)
Y = get_y(X1, X2)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_zlabel("$f(x_1, x_2)$")

ax.plot(np.travel(X1), np.travel(X2), np.travel(Y), "o", color='blue')
plt.show()
fig = plt.figure()
ax1 = fig.add_subplot(111, projection='3d')

ax1.set_xlabel("$x_1$")
ax1.set_ylabel("$x_2$")
ax1.set_zlabel("$f(x_1, x_2)$")

ax1.scatter3D(np.travel(X1), np.travel(X2), np.travel(Y))
plt.show()

For 3D plots, rather than messing with pyplot(plt) directly, you can create a figure object with fig=plt.figure() and create the graph in it.

You can use plot_surface to color the surface according to its value. Let’s plot a multivariate Gaussian distribution.

Try to get the probability density of a Gaussian distribution with a slightly negative correlation.

From ``python from scipy.stats import multivariate_normal

mu = np.array([0,0]) sigma = np.array([[1,-0.8],[-0.8,1]])

x1 = np.linspace(-3,3,100) x2 = np.linspace(-3,3,100)

X = np.meshgrid(x1,x2)

X1, X2 = np.meshgrid(x1, x2) X = np.c_[np.travel(X1), np.travel(X2)]. Z = multivariate_normal.pdf(X, mu,sigma).reshape(100, -1)

fig = plt.figure() ax = fig.add_subplot(111, projection=‘3d’) ax.plot_surface(X1, X2, Z, cmap=‘bwr’, linewidth=0) fig.show()


    /Users/hiroshi/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py:445: UserWarning: Matplotlib is currently using module:// ipykernel.pylab.backend_inline, which is a non-GUI backend, so cannot show the figure.
      % get_backend())