Basics of keras and RNN

As a review, I will try to implement a Recurrent Neural Network (RNN) using keras. I think anything is fine, but I’ll prepare a damped oscillation curve as time series data, and try to train it using RNN.

github

  • The file in jupyter notebook format is here

google colaboratory

  • If you want to run it in google colaboratory here

Author’s environment

The author’s OS is macOS, and the options are different from Linux and Unix commands.

! sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G6020
Python -V
Python 3.7.3

Import the basic libraries and keras and check their versions.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np

import tensorflow as tf
from tensorflow import keras

print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)
print('tensorflow version : ', tf.__version__)
print('keras version : ', keras.__version__)
matplotlib version : 3.0.3
scipy version : 1.4.1
numpy version : 1.19.4
tensorflow version : 2.1.0
keras version : 2.2.4-tf

Damping vibration curve

For the sample data, we will sample from the following equation.

$$ y = \exp\left(-\frac{x}{\tau}\right)\cos(x) $$

This is a common model in natural phenomena, with waves hitting and gradually converging.

x = np.linspace(0, 5 * np.pi, 200)
y = np.exp(-x / 5) * (np.cos(x))

Checking the data

Let’s look at the details of the $x$ and $y$ data.

print('shape : ', x.shape)
print('ndim : ', x.ndim)
print('data : ', x[:10])
shape : (200,)
ndim : 1
data : [0. 0.07893449 0.15786898 0.23680347 0.31573796 0.39467244
 0.47360693 0.55254142 0.63147591 0.7104104 ]
print('shape : ', y.shape)
print('ndim : ', y.ndim)
print('data : ', y[:10])
shape : (200,)
ndim : 1
data : [1. 0.98127212 0.9568705 0.92712705 0.89239742 0.85305798
 0.80950282 0.76214062 0.71139167 0.65768474]

Let’s check the graph.

plt.plot(x,y)
plt.grid()
plt.show()

As $\tau=5$, we get a nice decay curve.

Building the neural net

We will preprocess the data to feed it into keras and build the recursive neural net.

The specification of compile is as follows.

compile(self, optimizer, loss, metrics=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.layers import Dense

NUM_RNN = 20
NUM_MIDDLE = 40

# Preprocess the data
n = len(x) - NUM_RNN
r_x = np.zeros((n, NUM_RNN))
r_y = np.zeros((n, NUM_RNN))
for i in range(0, n):
  r_x[i] = y[i: i + NUM_RNN].
  r_y[i] = y[i + 1: i + NUM_RNN + 1].

r_x = r_x.reshape(n, NUM_RNN, 1)
r_y = r_y.reshape(n, NUM_RNN, 1)

# Build the neural net
model = Sequential()

model.add(SimpleRNN(NUM_MIDDLE, input_shape=(NUM_RNN, 1), return_sequences=True))
model.add(Dense(1, activation="linear"))
model.compile(loss="mean_squared_error", optimizer="sgd"))

Check the data to be fed and the model overview.

print(r_y.shape)
print(r_x.shape)
print(model.summary())
(180, 20, 1)
(180, 20, 1)
Model: "sequential
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) (None, 20, 40) 1680
_________________________________________________________________
dense (Dense) (None, 20, 1) 41
=================================================================
Total params: 1,721
Trainable params: 1,721
Non-trainable params: 0
_________________________________________________________________
None

Training

We will use the fit method to perform training. The specification of the fit method is as follows. See here.

fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
batch_size = 10
epochs = 500

# use validation_split to use the last 10% for validation
history = model.fit(r_x, r_y, epochs=epochs, batch_size=batch_size, validation_split=0.1, verbose=0)

Visualization of the loss function

Let’s visualize how the error is reduced by training.

loss = history.history['loss'] # Loss function for training data
val_loss = history.history['val_loss'] # loss function for test data

plt.plot(np.arange(len(loss)), loss, label='loss')
plt.plot(np.arange(len(val_loss)), val_loss, label='val_loss')
plt.grid()
plt.legend()
plt.show()

Check the result

# Initial input values
res = r_y[0].reshape(-1)

for i in range(0, n):
  _y = model.predict(res[- NUM_RNN:].reshape(1, NUM_RNN, 1))
  res = np.append(res, _y[0][NUM_RNN - 1][0])

plt.plot(np.arange(len(y)), y, label=r"$\exp\left(-\frac{x}{\tau}\right) \cos x$")
plt.plot(np.arange(len(res)), res, label="RNN result")
plt.legend()
plt.grid()
plt.show()

If we use a simple RNN, the discrepancies will gradually become more pronounced, and we can probably get better results by improving the epoch and the model, but this is a review so we’ll stop here. I’m going to try LSTM next.