[keras] Basics of keras and LSTM

Basics of keras and LSTM, Comparison with RNN

LSTM stands for Long Short Term Memory, and is said to be able to learn long-term dependencies. LSTM is a type of RNN, and the basic idea is the same. I’m not going to go into the details, because you can find plenty of them by searching.

Also, here is a comparison between LSTM and RNN.

github

The file in jupyter notebook format is here

google colaboratory

If you want to run it in google colaboratory here

Author’s environment

The author’s OS is macOS, and the options are different from Linux and Unix commands.

! sw_vers

ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G6020

Python -V

Python 3.7.3

Import the basic libraries and keras and check their versions.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np

import tensorflow as tf
from tensorflow import keras

print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)
print('tensorflow version : ', tf.__version__)
print('keras version : ', keras.__version__)

matplotlib version : 3.0.3
scipy version : 1.4.1
numpy version : 1.19.4
tensorflow version : 2.1.0
keras version : 2.2.4-tf

Damping vibration curve

For the sample data, we will sample from the following equation.

$$ y = \exp\left(-\frac{x}{\tau}\right)\cos(x) $$

This is a common model in natural phenomena, with waves hitting and gradually converging. To compare with a simple RNN, we will use the same function for the sample data.

x = np.linspace(0, 5 * np.pi, 200)
y = np.exp(-x / 5) * (np.cos(x))

Checking the data

Let’s look at the details of the $x$ and $y$ data.

print('shape : ', x.shape)
print('ndim : ', x.ndim)
print('data : ', x[:10])

shape : (200,)
ndim : 1
data : [0. 0.07893449 0.15786898 0.23680347 0.31573796 0.39467244
 0.47360693 0.55254142 0.63147591 0.7104104 ]

print('shape : ', y.shape)
print('ndim : ', y.ndim)
print('data : ', y[:10])

shape : (200,)
ndim : 1
data : [1. 0.98127212 0.9568705 0.92712705 0.89239742 0.85305798
 0.80950282 0.76214062 0.71139167 0.65768474]

Let’s check the graph.

plt.plot(x,y)
plt.grid()
plt.show()

As $\tau=5$, we get a nice decay curve.

Building the neural net

We will preprocess the data to feed it into keras and build the recursive neural net.

The specification of compile is as follows.

compile(self, optimizer, loss, metrics=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM

NUM_RNN = 20
NUM_MIDDLE = 40

# Preprocess the data
n = len(x) - NUM_RNN
r_x = np.zeros((n, NUM_RNN))
r_y = np.zeros((n, NUM_RNN))
for i in range(0, n):
  r_x[i] = y[i: i + NUM_RNN].
  r_y[i] = y[i + 1: i + NUM_RNN + 1].

r_x = r_x.reshape(n, NUM_RNN, 1)
r_y = r_y.reshape(n, NUM_RNN, 1)

# Build an RNN neural net
rnn_model = Sequential()
rnn_model.add(SimpleRNN(NUM_MIDDLE, input_shape=(NUM_RNN, 1), return_sequences=True))
rnn_model.add(Dense(1, activation="linear"))
rnn_model.compile(loss="mean_squared_error", optimizer="sgd")

# Build the LSTM neural net
lstm_model = Sequential()
lstm_model.add(LSTM(NUM_MIDDLE, input_shape=(NUM_RNN, 1), return_sequences=True))
lstm_model.add(Dense(1, activation="linear"))
lstm_model.compile(loss="mean_squared_error", optimizer="sgd")

Check the data to be submitted and the model overview.

print(r_y.shape)
print(r_x.shape)

(180, 20, 1)
(180, 20, 1)

We can see that the LSTM model has more parameters. We can see that LSTM has more parameters and it takes longer to train.

print(rnn_model.summary())
print(lstm_model.summary())

Model: "sequential".
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) (None, 20, 40) 1680
_________________________________________________________________
dense (Dense) (None, 20, 1) 41
=================================================================
Total params: 1,721
Trainable params: 1,721
Non-trainable params: 0
_________________________________________________________________
None
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 20, 40) 6720
_________________________________________________________________
dense_1 (Dense) (None, 20, 1) 41
=================================================================
Total params: 6,761
Trainable params: 6,761
Non-trainable params: 0
_________________________________________________________________
None

Training

We will use the fit method to perform training. The specification of the fit method is as follows. See here .

fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)

batch_size = 10
epochs = 1000

# use validation_split to use the last 10% for validation
rnn_history = rnn_model.fit(r_x, r_y, epochs=epochs, batch_size=batch_size, validation_split=0.1, verbose=0)

# use validation_split to use the last 10% for validation
lstm_history = lstm_model.fit(r_x, r_y, epochs=epochs, batch_size=batch_size, validation_split=0.1, verbose=0)

Visualization of the loss function

Let’s visualize how the error is reduced by training.

rnn_loss = rnn_history.history['loss'] # Loss function for training data
rnn_val_loss = rnn_history.history['val_loss'] # loss function for test data

lstm_loss = lstm_history.history['loss'] # Loss function for training data
lstm_val_loss = lstm_history.history['val_loss'] # Loss function for test data

plt.plot(np.arange(len(rnn_loss)), rnn_loss, label='rnn_loss')
plt.plot(np.range(len(rnn_val_loss)), rnn_val_loss, label='rnn_val_loss')
plt.plot(np.arange(len(lstm_loss)), lstm_loss, label='lstm_loss')
plt.plot(np.arange(len(lstm_val_loss)), lstm_val_loss, label='lstm_val_loss')
plt.grid()
plt.legend()
plt.show()

Check the result

# Initial input values
rnn_res = r_y[0].reshape(-1)
lstm_res = r_y[0].reshape(-1)

for i in range(0, n):
  _rnn_y = rnn_model.predict(rnn_res[- NUM_RNN:].reshape(1, NUM_RNN, 1))
  rnn_res = np.append(rnn_res, _rnn_y[0][NUM_RNN - 1][0])

  _lstm_y = lstm_model.predict(lstm_res[- NUM_RNN:].reshape(1, NUM_RNN, 1))
  lstm_res = np.append(lstm_res, _lstm_y[0][NUM_RNN - 1][0])

plt.plot(np.arange(len(y)), y, label=r"$\exp\left(-\frac{x}{\tau}\right) \cos x$")
plt.plot(np.arange(len(rnn_res)), rnn_res, label="RNN result")
plt.plot(np.range(len(lstm_res)), lstm_res, label="LSTM result")
plt.legend()
plt.grid()
plt.show()

In the case of the damping vibration curve, there seems to be no difference between LSTM and RNN with the parameters we have set. However, at the practical level, LSTM is used more than RNN, and the results seem to be better. Since this is just an exercise, I think I will end here.