Stock prediction using RNN, LSTM
RNN and LSTM are used for forecasting time series data. There are many kinds of time series data, such as temperature of a certain place, number of visitors, price of a product, etc. However, I would like to use RNN and LSTM to predict the stock price, which is the easiest data to obtain.
However, neural nets can only make predictions within the scope of the data obtained, and the model is almost useless when the situation is unexpected. For example, a neural network cannot predict the Corona Shock even if it uses data from one year before the Corona Shock.
In addition, the formation of stock prices is not only technical, but also complicated by fundamentals, real demand, futures, and other factors, so it is difficult to predict the future with LSTM. Nevertheless, it looks interesting, so I will try to use the year-end time to get used to LSTM.
This is just a part of practice to get used to RNN and LSTM, so please don’t think that you can predict stock prices with this result.
github
- The file in jupyter notebook format is here
google colaboratory
- To run it in google colaboratory here
Author’s environment
The author’s OS is macOS, and the options are different from those of Linux and Unix commands.
! sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G6032
Python -V
Python 3.8.5
Import the basic libraries and keras and check their versions.
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)
print('tensorflow version : ', tf.__version__)
print('keras version : ', keras.__version__)
matplotlib version : 3.3.2
scipy version : 1.5.2
numpy version : 1.18.5
tensorflow version : 2.3.1
keras version : 2.4.0
Getting the data
In this example, we will forecast the data of the Nikkei 225 and the S&P 500 stock prices in the United States. We downloaded the data from the following sites.
Nikkei 225 data
Data of SP500
Forecast of Nikkei 225
Checking the data
First of all, let’s take a look at the Nikkei data.
!ls
[34mfiles_bk[m[m lstm_nb.md lstm_nb.txt nikkei.csv sp500_2019.csv sp500_2019_utf8.csv sp500_2020_utf8.csv
lstm_nb.ipynb lstm_nb.py [34mlstm_nb_files[m[m nikkei_utf8.csv sp500_2019_utf8.csv sp500_2020.csv sp500_2020_utf8.csv
%%bash
head nikkei.csv
�f�[�^���t,�I�l,�n�l,���l,���l
"2017/01/04", "19594.16", "19298.68", "19594.16", "19277.93"
"2017/01/05", "19520.69", "19602.10", "19615.40", "194773.28"
"2017/01/06", "19454.33", "19393.55", "19772.37", "19354.44"
"2017/01/10", "19301.44", "19414.83", "19484.90", "19255.35"
"2017/01/11", "19364.67", "19358.64", "19402.17", "19325.46"
"2017/01/12", "19134.70", "19300.19", "19300.19", "19069.02"
"2017/01/13", "19287.28", "19174.97", "19299.36", "19156.93"
"2017/01/16", "19095.24", "19219.13", "19255.41", "19061.27"
"01/17/2017", "18813.53", "19038.45", "19043.91", "18812.86"
The character encoding is shift-jis, so fix it to utf-8.
%%bash
nkf --guess nikkei.csv
Shift_JIS (LF)
%%bash
nkf -w nikkei.csv > nikkei_utf8.csv
%%bash
head nikkei_utf8.csv
Data date, closing price, opening price, high price, low price
"2017/01/04", "19594.16", "19298.68", "19594.16", "19277.93"
"2017/01/05", "19520.69", "19602.10", "19615.40", "194773.28"
"2017/01/06", "19454.33", "19393.55", "19772.37", "19354.44"
"2017/01/10", "19301.44", "19414.83", "19484.90", "19255.35"
"2017/01/11", "19364.67", "19358.64", "19402.17", "19325.46"
"2017/01/12", "19134.70", "19300.19", "19300.19", "19069.02"
"2017/01/13", "19287.28", "19174.97", "19299.36", "19156.93"
"2017/01/16", "19095.24", "19219.13", "19255.41", "19061.27"
"01/17/2017", "18813.53", "19038.45", "19043.91", "18812.86"
This looks fine, so load it in pandas.
df = pd.read_csv('nikkei_utf8.csv')
df.head()
Data date | Close price | Open price | High price | Low price | |
---|---|---|---|---|---|
0 | 01/04/2017 | 19594.16 | 19298.68 | 19594.16 | 19277.93 |
1 | 01/05/2017 | 19520.69 | 19602.10 | 19615.40 | 19473.28 |
2 | 01/06/2017 | 19454.33 | 19393.55 | 19472.37 | 19354.44 |
3 | 01/10/2017 | 19301.44 | 19414.83 | 19484.90 | 19255.35 |
4 | 01/11/2017 | 19364.67 | 19358.64 | 19402.17 | 19325.46 |
df.tail()
Data date | Close price | Open price | High price | Low price | |
---|---|---|---|---|---|
971 | 2020/12/24 | 26668.35 | 26635.11 | 26764.53 | 26605.26 |
972 | 2020/12/25 | 26656.61 | 26708.10 | 26716.61 | 26638.28 |
973 | 2020/12/28 | 26854.03 | 26691.29 | 26854.03 | 26664.60 |
974 | 2020/12/29 | 27568.15 | 26936.38 | 27602.52 | 26921.14 |
975 | This material is the copyrighted work of Nikkei, and no part of this material may be reproduced, in whole or in part, in any form without the permission of ... | NaN | NaN | NaN | NaN |
Remove the copyright notice from the last line. It will not be copied or distributed.
df.drop(index=975, inplace=True)
df.tail()
Data date | Close price | Open price | High price | Low price | |
---|---|---|---|---|---|
970 | 2020/12/23 | 26524.79 | 26580.43 | 26585.21 | 26414.74 |
971 | 2020/12/24 | 26668.35 | 26635.11 | 26764.53 | 26605.26 |
972 | 2020/12/25 | 26656.61 | 26708.10 | 26716.61 | 26638.28 |
973 | 2020/12/28 | 26854.03 | 26691.29 | 26854.03 | 26664.60 |
974 | 2020/12/29 | 27568.15 | 26936.38 | 27602.52 | 26921.14 |
Let’s visualize the data. You can see that the Corona shock made a big dent in the data, but by the end of 2020, the data has risen significantly due to monetary easing.
Data shaping
Using the first set of data as a baseline, we calculate the percentage change from that value and train on that list.
def shape_data(data_list):
return [d / data_list[0] - 1 for d in data_list].
df['data_list'] = shape_data(df['closing'])
ticks = 10
xticks = ticks * 5
plt.plot(df['data_date'][::ticks], df['closing price'][::ticks], label='nikkei stock')
plt.grid()
plt.legend()
plt.xticks(df['data date'][::xticks], rotation=60)
plt.show()
We also show the graph rewritten in ratio.
plt.plot(df.index.values[::ticks], df['data_list'][::ticks], label='nikkei stock')
plt.grid()
plt.legend()
plt.show()
Prepare constants
### We have about four years of data, but we will divide it into 8 parts and make predictions in each area.
TERM_PART_LIST = [0, 120, 240, 360, 480, 600, 720, 840].
# Number of data to use for prediction.
# Predict the next 30 data from 90 data
NUM_LSTM = 90
# Number of intermediate layers
NUM_MIDDLE = 200
# Constants for the neural network model
batch_size = 100
epochs = 2000
validation_split = 0.25
Prepare the data
Prepare the data for submission to keras.
def get_x_y_lx_ly(term_part):
date = np.array(df['data date'][TERM_PART_LIST[term_part]: TERM_PART_LIST[term_part + 1]])
x = np.array(df.index.values[TERM_PART_LIST[term_part]: TERM_PART_LIST[term_part + 1]])
y = np.array(df['data_list'][TERM_PART_LIST[term_part]: TERM_PART_LIST[term_part + 1]])
n = len(y) - NUM_LSTM
l_x = np.zeros((n, NUM_LSTM))
l_y = np.zeros((n, NUM_LSTM))
for i in range(0, n):
l_x[i] = y[i: i + NUM_LSTM].
l_y[i] = y[i + 1: i + NUM_LSTM + 1].
l_x = l_x.reshape(n, NUM_LSTM, 1)
l_y = l_y.reshape(n, NUM_LSTM, 1)
return n, date, x, y, l_x, l_y
n, date, x, y, l_x, l_y = get_x_y_lx_ly(0)
print('shape : ', x.shape)
print('ndim : ', x.ndim)
print('data : ', x[:10])
shape : (120,)
ndim : 1
data : [0 1 2 3 4 5 6 7 8 9].
print('shape : ', y.shape)
print('ndim : ', y.ndim)
print('data : ', y[:10])
shape : (120,)
ndim : 1
data : [ 0. -0.00374959 -0.00713631 -0.01493915 -0.01171216 -0.02344882
-0.01566181 -0.02546269 -0.03983993 -0.03571421]
print(l_y.shape)
print(l_x.shape)
(30, 90, 1)
(30, 90, 1)
Model building
This function defines the construction of the model. The default is RNN.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.layers import GRU
def build_model(model_name='RNN'):
# Build an LSTM neural net
model = Sequential()
# Make it possible to choose between RNN, LSTM and GRU
if model_name == 'RNN':
model.add(SimpleRNN(NUM_MIDDLE, input_shape=(NUM_LSTM, 1), return_sequences=True))
if model_name == 'LSTM':
model.add(LSTM(NUM_MIDDLE, input_shape=(NUM_LSTM, 1), return_sequences=True))
if model_name == 'GRU':
model.add(GRU(NUM_MIDDLE, input_shape=(NUM_LSTM, 1), return_sequences=True))
model.add(Dense(1, activation="linear"))
model.compile(loss="mean_squared_error", optimizer="sgd")
return model
# Deepen the neural net (not used in this case)
def build_model_02():
NUM_MIDDLE_01 = 100
NUM_MIDDLE_02 = 120
# Build the LSTM neural net
model = Sequential()
model.add(LSTM(NUM_MIDDLE_01, input_shape = (NUM_LSTM, 1), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(NUM_MIDDLE_02, return_sequences=True))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation("linear"))
model.compile(loss="mean_squared_error", optimizer="sgd")
# model.compile(loss="mse", optimizer='rmsprop')
return model
model = build_model('RNN')
Model Details
print(model.summary())
Model: "sequential".
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) (None, 90, 200) 40400
_________________________________________________________________
dense (Dense) (None, 90, 1) 201
=================================================================
Total params: 40,601
Trainable params: 40,601
Non-trainable params: 0
_________________________________________________________________
None
# use validation_split to use the last 10% for validation
history = model.fit(l_x, l_y, epochs=epochs, batch_size=batch_size, validation_split=validation_split, verbose=0)
Visualization of the loss function
Let’s visualize how the error is reduced by learning. It looks like it is converging at the current number of epochs.
loss = history.history['loss'].
val_loss = history.history['val_loss']
plt.plot(np.arange(len(loss)), loss, label='loss')
plt.plot(np.arange(len(val_loss)), val_loss, label='val_loss')
plt.grid()
plt.legend()
plt.show()
Checking the results with RNN
The period filled in light orange is the period we used for prediction. In that period, the prediction is consistent with the actual trend. The solid orange line is the actual stock price trend, and the blue line is the forecast.
def plot_result():
# Initial input values
res = [].
res = np.append(res, l_x[0][0][0])
res = np.append(res, l_y[0].reshape(-1))
for i in range(0, n):
_y = model.predict(res[- NUM_LSTM:].reshape(1, NUM_LSTM, 1))
# Use the predicted data as input data for the next prediction
res = np.append(res, _y[0][NUM_LSTM - 1][0])
res = np.delete(res, -1)
plt.plot(date, y, label="stock price", color='coral')
plt.plot(date, res, label="prediction result", color='blue')
plt.xticks(date[::12], rotation=60)
plt.legend()
plt.grid()
plt.axvspan(0, NUM_LSTM, color="coral", alpha=0.2)
plt.show()
print('{} - {} results'.format(date[0], date[NUM_LSTM - 1]))
plot_result()
Results for 2017/01/04 - 2017/05/16
What do you think of the results? Well, I guess I didn’t miss the trend too much, lol.
Forecasting for other periods
Let’s try forecasting for other periods using the previous functions.
for term in [1, 2, 3, 4, 5, 6]:
n, date, x, y, l_x, l_y = get_x_y_lx_ly(term)
model = build_model('RNN')
history = model.fit(l_x, l_y, epochs=epochs, batch_size=batch_size, validation_split=validation_split, verbose=0)
print('Prediction period : {} - {} results'.format(date[0], date[NUM_LSTM - 1]))
plot_result()
Result for prediction period : 28/06/2017 - 07/11/2017
Results for the forecast period : 2017/12/21 - 2018/05/08
Results for the forecast period : 20/06/2018 - 29/10/2018
Results for the forecast period : 12/12/2018 - 26/04/2019
Results for the forecast period : 18/06/2019 - 29/10/2019
Results for the forecast period : 12/12/2019 - 04/27/2020
Prediction with LSTM.
for term in [0, 1]:
n, date, x, y, l_x, l_y = get_x_y_lx_ly(term)
model = build_model('LSTM')
history = model.fit(l_x, l_y, epochs=epochs, batch_size=batch_size, validation_split=validation_split, verbose=0)
print('Prediction period : {} - {} results'.format(date[0], date[NUM_LSTM - 1]))
plot_result()
Result for prediction period : 2017/01/04 - 2017/05/16
Results for the forecast period : 28/06/2017 - 07/11/2017
LSTM could not predict much with the simple model we did. Therefore, only two graphs are shown. It would be better if we discussed it a little more, but that’s not the purpose of this article, so we’ll leave it at that.
Prediction with GRU
for term in [0, 1]:
n, date, x, y, l_x, l_y = get_x_y_lx_ly(term)
model = build_model('GRU')
history = model.fit(l_x, l_y, epochs=epochs, batch_size=batch_size, validation_split=validation_split, verbose=0)
print('Prediction period : {} - {} results'.format(date[0], date[NUM_LSTM - 1]))
plot_result()
Result for prediction period : 2017/01/04 - 2017/05/16
Results for the forecast period : 28/06/2017 - 07/11/2017
GRU did not give any meaningful results.
S&P 500 predictions.
2019.
In the same way, I will make a prediction for the S&P 500, the leading stock index in the US. The file can be downloaded from the above website.
!ls
[34mfiles_bk[m[m lstm_nb.md lstm_nb.txt nikkei.csv sp500_2019.csv sp500_2019_utf8.csv sp500_2020_utf8.csv
lstm_nb.ipynb lstm_nb.py [34mlstm_nb_files[m[m nikkei_utf8.csv sp500_2019_utf8.csv sp500_2020.csv sp500_2020_utf8.csv
Here’s a quick look at the contents of the file.
%%bash
head sp500_2019.csv
1557 ����ETF SPDR S&P500 ETF�iETF�j,,,,,
���t,�n�l,���l,���l,�I�l,�o����,�I�l�����l
"2019-01-04", "26620", "26830", "26310", "26780", "7665", "26780"
"2019-01-07", "27710", "27790", "27450", "27520", "1568", "27520"
"2019-01-08", "27800", "28020", "27760", "27910", "2051", "27910"
"2019-01-09", "27960", "28300", "27960", "28210", "2557", "28210"
"2019-01-10", "28050", "28050", "27600", "27830", "7270", "27830"
"2019-01-11", "28300", "28300", "27950", "28150", "1584", "28150"
"2019-01-15", "28100", "28300", "28080", "28210", "7142", "28210"
"2019-01-16", "28430", "28430", "28260", "28300", "936", "28300"
The charset seems to be Shift-JIS, so we’ll replace it with utf-8.
%%bash
nkf -w sp500_2019.csv > sp500_2019_utf8.csv
Looking at it further, the first line is superfluous to put into pandas, so remove it.
%%bash
head sp500_2019_utf8.csv
1557 TSE ETF SPDR S&P500 ETF (ETF) ,,,,,
Date, Open, High, Low, Close, Volume, Adjusted Close
"2019-01-04", "26620", "26830", "26310", "26780", "7665", "26780"
"2019-01-07", "27710", "27790", "27450", "27520", "1568", "27520"
"2019-01-08", "27800", "28020", "27760", "27910", "2051", "27910"
"2019-01-09", "27960", "28300", "27960", "28210", "2557", "28210"
"2019-01-10", "28050", "28050", "27600", "27830", "7270", "27830"
"2019-01-11", "28300", "28300", "27950", "28150", "1584", "28150"
"2019-01-15", "28100", "28300", "28080", "28210", "7142", "28210"
"2019-01-16", "28430", "28430", "28260", "28300", "936", "28300"
%%bash
sed -ie '1d' sp500_2019_utf8.csv
%%bash
head sp500_2019_utf8.csv
Date, Open Price, High Price, Low Price, Close Price, Volume, Adjusted Close Price
"2019-01-04", "26620", "26830", "26310", "26780", "7665", "26780"
"2019-01-07", "27710", "27790", "27450", "27520", "1568", "27520"
"2019-01-08", "27800", "28020", "27760", "27910", "2051", "27910"
"2019-01-09", "27960", "28300", "27960", "28210", "2557", "28210"
"2019-01-10", "28050", "28050", "27600", "27830", "7270", "27830"
"2019-01-11", "28300", "28300", "27950", "28150", "1584", "28150"
"2019-01-15", "28100", "28300", "28080", "28210", "7142", "28210"
"2019-01-16", "28430", "28430", "28260", "28300", "936", "28300"
"2019-01-17", "28500", "28900", "28420", "28420", "966", "28420"
Now that we’re ready, we can put it into pandas.
df = pd.read_csv('sp500_2019_utf8.csv')
df.head()
Date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | |
---|---|---|---|---|---|---|---|
0 | 2019-01-04 | 26620 | 26830 | 26310 | 26780 | 7665 | 26780 |
1 | 2019-01-07 | 27710 | 27790 | 27450 | 27520 | 1568 | 27520 |
2 | 2019-01-08 | 27800 | 28020 | 27760 | 27910 | 2051 | 27910 |
3 | 2019-01-09 | 27960 | 28300 | 27960 | 28210 | 2557 | 28210 |
4 | 2019-01-10 | 28050 | 28050 | 27600 | 27830 | 7270 | 27830 |
df.tail()
Date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | |
---|---|---|---|---|---|---|---|
236 | 2019-12-24 | 35200 | 35200 | 35150 | 35150 | 2432 | 35150 |
237 | 2019-12-25 | 35150 | 35200 | 35050 | 35050 | 2052 | 35050 |
238 | 2019-12-26 | 35150 | 35250 | 35150 | 35200 | 2276 | 35200 |
239 | 2019-12-27 | 35450 | 35500 | 35350 | 35500 | 2787 | 35500 |
240 | 2019-12-30 | 35400 | 35450 | 35250 | 35250 | 3542 | 35250 |
As with the Nikkei 225, we convert the closing price to the rate of change. We will use the same function.
df['data_list'] = shape_data(df['closing price'])
We also want to reuse the previous function, so we will rename the column named date to data_date.
df = df.rename(columns={'date':'data date'})
df.head()
Data date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | data_list | |
---|---|---|---|---|---|---|---|---|
0 | 2019-01-04 | 26620 | 26830 | 26310 | 26780 | 7665 | 26780 | 0.000000 |
1 | 2019-01-07 | 27710 | 27790 | 27450 | 27520 | 1568 | 27520 | 0.027633 |
2 | 2019-01-08 | 27800 | 28020 | 27760 | 27910 | 2051 | 27910 | 0.042196 |
3 | 2019-01-09 | 27960 | 28300 | 27960 | 28210 | 2557 | 28210 | 0.053398 |
4 | 2019-01-10 | 28050 | 28050 | 27600 | 27830 | 7270 | 27830 | 0.039208 |
df.tail()
Data date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | data_list | |
---|---|---|---|---|---|---|---|---|
236 | 2019-12-24 | 35200 | 35200 | 35150 | 35150 | 2432 | 35150 | 0.312547 |
237 | 2019-12-25 | 35150 | 35200 | 35050 | 35050 | 2052 | 35050 | 0.308813 |
238 | 2019-12-26 | 35150 | 35250 | 35150 | 35200 | 2276 | 35200 | 0.314414 |
239 | 2019-12-27 | 35450 | 35500 | 35350 | 35500 | 2787 | 35500 | 0.325616 |
240 | 2019-12-30 | 35400 | 35450 | 35250 | 35250 | 3542 | 35250 | 0.316281 |
This is a bird’s eye view of the entire graph.
plt.plot(df['data date'][::ticks], df['closing price'][::ticks], label='sp500 2019')
plt.grid()
plt.legend()
plt.xticks(df['data date'][::xticks], rotation=60)
plt.show()
Let’s make a prediction and grok the result.
for term in [0, 1]:
n, date, x, y, l_x, l_y = get_x_y_lx_ly(term)
model = build_model('RNN')
history = model.fit(l_x, l_y, epochs=epochs, batch_size=batch_size, validation_split=validation_split, verbose=0)
print('Prediction period : {} - {} results'.format(date[0], date[NUM_LSTM - 1]))
plot_result()
Result for prediction period : 2019-01-04 - 2019-05-22
Results for the forecast period : 2019-07-04 - 2019-11-15
As with the Nikkei 225, the forecast follows the trend, which may help prevent reversals lol.
2020
Next, let’s try to predict the stock price in 2020. We’ll skip the preprocessing of the data.
%%bash
head sp500_2020_utf8.csv
nkf -w sp500_2020.csv > sp500_2020_utf8.csv
sed -ie '1d' sp500_2020_utf8.csv
Date, Open Price, High Price, Low Price, Close Price, Volume, Adjusted Close Price
"2020-01-06", "34800", "34850", "34700", "34750", "7632", "34750"
"2020-01-07", "35050", "35200", "35050", "35200", "3487", "35200"
"2020-01-08", "34550", "34900", "34200", "34850", "11349", "34850"
"2020-01-09", "35450", "35600", "35450", "35600", "6255", "35600"
"2020-01-10", "35850", "35900", "35800", "35900", "3461", "35900"
"2020-01-14", "36200", "36250", "36100", "36150", "4379", "36150"
"2020-01-15", "35950", "36050", "35900", "35950", "4270", "35950"
"2020-01-16", "36150", "36250", "36100", "36250", "2707", "36250"
"2020-01-17", "36500", "36550", "36450", "36450", "9618", "36450"
df = pd.read_csv('sp500_2020_utf8.csv')
df.head()
Date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | |
---|---|---|---|---|---|---|---|
0 | 2020-01-06 | 34800 | 34850 | 34700 | 34750 | 7632 | 34750 |
1 | 2020-01-07 | 35050 | 35200 | 35050 | 35200 | 3487 | 35200 |
2 | 2020-01-08 | 34550 | 34900 | 34200 | 34850 | 11349 | 34850 |
3 | 2020-01-09 | 35450 | 35600 | 35450 | 35600 | 6255 | 35600 |
4 | 2020-01-10 | 35850 | 35900 | 35800 | 35900 | 3461 | 35900 |
df['data_list'] = shape_data(df['closing price'])
df = df.rename(columns={'date':'data_date'})
df.head()
Data date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | data_list | |
---|---|---|---|---|---|---|---|---|
0 | 2020-01-06 | 34800 | 34850 | 34700 | 34750 | 7632 | 34750 | 0.000000 |
1 | 2020-01-07 | 35050 | 35200 | 35050 | 35200 | 3487 | 35200 | 0.012950 |
2 | 2020-01-08 | 34550 | 34900 | 34200 | 34850 | 11349 | 34850 | 0.002878 |
3 | 2020-01-09 | 35450 | 35600 | 35450 | 35600 | 6255 | 35600 | 0.024460 |
4 | 2020-01-10 | 35850 | 35900 | 35800 | 35900 | 3461 | 35900 | 0.033094 |
df.tail()
Data date | Open price | High price | Low price | Close price | Volume | Adjusted closing price | data_list | |
---|---|---|---|---|---|---|---|---|
234 | 2020-12-21 | 38250 | 38300 | 38100 | 38300 | 6596 | 38300 | 0.102158 |
235 | 2020-12-22 | 38000 | 38100 | 37800 | 37900 | 6080 | 37900 | 0.090647 |
236 | 2020-12-24 | 38050 | 38200 | 38050 | 38100 | 2621 | 38100 | 0.096403 |
237 | 2020-12-25 | 38300 | 38300 | 38100 | 38200 | 1945 | 38200 | 0.099281 |
238 | 2020-12-28 | 38250 | 38450 | 38200 | 38400 | 4734 | 38400 | 0.105036 |
plt.plot(df['data date'][::ticks], df['closing price'][::ticks], label='sp500 2020')
plt.grid()
plt.legend()
plt.xticks(df['data date'][::xticks], rotation=60)
plt.show()
for term in [0, 1]:
n, date, x, y, l_x, l_y = get_x_y_lx_ly(term)
model = build_model('RNN')
history = model.fit(l_x, l_y, epochs=epochs, batch_size=batch_size, validation_split=validation_split, verbose=0)
print('Prediction period : {} - {} results'.format(date[0], date[NUM_LSTM - 1]))
plot_result()
Result for prediction period : 2020-01-06 - 2020-05-20
Result for the forecast period : 2020-07-02 - 2020-11-13
Summary
There are a lot of things we can do, such as feature extraction, model exploration, hyperparameter adjustment, etc., but the goal is to get used to keras, and since we have no plans for a service, we will end here. There are many factors that determine stock prices, and I believe it will be quite difficult to predict them with a simple NN.