10 min readJun 13, 2022

STOCK PRICE PREDICTION AND FORECASTING VIA AN OBJECT ORIENTED PROGRAMMING

Time series has a wide range of application areas among which finance, health, and insurance sector stand out. In finance, predicting/forecasting stock price has been at the center of the time series applications in that properly modeled stock price can be used as an input in risk, pricing and so on. To accomplish this task, along with the traditional time series application, some deep learning algorithms have been developed to further improve the accuracy of the time series modeling.

Thus, the aim of this post is twofold:

Predicting and forecasting stock via selected time series models: After training the model, I conduct prediction to see how the model works. Finally, multi-step stock price forecasting is done to see the future pattern of stock price.
Using object oriented programming to select best time series model based on selected performance metric. This approach enables a layman to smoothly run time series application. More specifically, without any prior programming skill, one can run the algorithm simply deciding the inputs (parameters) and interpret the result accordingly.

Models that I employed here are:

* ARMA
* SARIMA
* LSTM

Let’s briefly talk about the time series models without going into math. First model is Autoregressive Moving Average, which is known as ARMA.

ARMA has two different parts: Autoregressive and Moving Average. As its name suggests, in autoregressive part, time series is regressed on its own lagged values.

where a, θ, 𝜖 represent constant term, slope coefficient and error term, respectively. The implicit assumption here is that lagged values have an impact on recent values of the related time series.

Moving Average part is basically the weighted average of error term related to the time series. Mathematically speaking:

So, this part focuses on minimizing the error that we get over time.

A Seasonal Autoregressive Integrated Moving Average, short for SARIMA, additionally, tries to capture seasonal component of the time series. So, you can think of it as an extension of the ARMA model.

where 𝜇 is the drift term.

The tricky part in these models is to find the best hyperparameters necessary to have the best-fitting model. By convention, the hyperparameter for ARMA are p and q and, for SARIMA, we have seasonal component labeled as d. Thus, the generalized version takes the following form: ARMA(p,d) and SARIMA(p,q,d).

Long Short Term Memory, abbreviated as LSTM, is a deep learning method basis on the complex neural network. It basically processes data passing on information as it propagates forward. The differences are the operations within the LSTM’s cells. It is considered as a non-parametric method.

Combining these methods give us to a chance to compare the performance of the time series modeling with Machine Learning and with Deep Learning.

ARMA, SARIMA, and LSTM Implementation for Stock Price Prediction and Forecasting

Let’s start running estimation, prediction and forecasting by separately using these models. The libraries that we are going to use throughout this post is as follows:

import math
import pandas as pd
from datetime import timedelta
from keras import Sequential
from keras.layers import LSTM, Dense, Flatten
from sklearn import metrics
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.arima_model import ARIMA, ARMA
import statsmodels.api as sm
import itertools
import numpy as np
import yfinance as yf
import datetime
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.core import Dense, Dropout, Activation
from keras.constraints import nonneg
import datetime
import warnings
warnings.filterwarnings(“ignore”)

I am going to use yahoo finance API to extract stock prices. To do that, we need to find the ticker of the company of interet and determine the time period we have. Ten stocks with highest market value listed in S&P-500 are selected for this analysis. These stocks are:

Apple
Microsoft
Google
Amazon
Facebook
JP Morgan
Visa
Johnson&Johnson
Wal-Mart
Bank of America

To be able to extract the stock price, I need tickers which is provided below.

stocks = [‘AAPL’,’MSFT’,’GOOGL’,’AMZN’,’FB’,’JPM’,’V’,’JNJ’,’WMT’,’BAC’]
start = datetime.datetime(2015,1,1)
end = datetime.datetime(2019,1,9)
stock_prices = yf.download(stocks,start=start,end = end, interval=’1d’)

Closing stock price is selected and 70% of the data is stored as train and 30% of it is kept as test set.

stock_prices=stock_prices[‘Close’]
df=stock_prices
split=int(df.shape[0]*0.7)

To start with, ARMA model is employed. You will find the graph including prediction and forecasting as an output. The parameters defined are:

number of forecast step is 30
p is in between 0 and 2
q is in between 0 and 1
plot_result is 1 if you want to have it as an output.

def arma(df,nstep, p, q,plot_result):
        """Autoregressive Moving Average"""
        df_truth = df[split:]
        pq = list(itertools.product(p, q))
        AIC_list_arma = pd.DataFrame({}, columns=['pram', 'AIC', 'Pred', 'df_truth', 'rmse', 'forecast'])
        nstep = 30
        for param in pq:
            mod = ARMA(df, order=param)
            results = mod.fit()
            pred_arma = results.predict(start=split, dynamic=False)
            rmse_arma = math.sqrt(((pred_arma - df_truth) ** 2).mean())
            forecast = results.forecast(steps=nstep)[0]
            temp = pd.DataFrame([[param, results.aic, np.array(pred_arma), np.array(df_truth), rmse_arma, forecast]],
                                    columns=['pram', 'AIC', 'Pred', 'df_truth', 'rmse', 'forecast'])
            AIC_list_arma = AIC_list_arma.append(temp, ignore_index=True)
        index = int(AIC_list_arma[['rmse']].astype(float).idxmin())
        pred = AIC_list_arma.loc[[index], ['Pred']]
        df_tr = AIC_list_arma.loc[[index], ['df_truth']]
        forecast = AIC_list_arma.loc[[index], ['forecast']]
        date_rng = pd.date_range(start=df_truth.index[-1], end=df_truth.index[-1] + timedelta(29),
                                 freq='D')
        df_tr = np.array(df_tr)[0][0]
        pred = np.array(pred)[0][0]
        forecast = np.array(forecast)[0][0]
        if plot_result:
            plt.figure(figsize=(5, 6))
            plt.plot(date_rng, forecast, label='%s ARMA Forecast' %i)
            plt.plot(df_truth.index, pred, label='%s ARMA Prediction' %i)
            plt.plot(df_truth.index, df_tr, label='%s ARMA Actual' %i)
            plt.legend()
            plt.show()
        print ("ARMA RMSE: %.4f"% rmse_arma)

To call the function, we run the following code:

for i in df.columns:
 arma(df[str(i)],30, p=range(0,2), q=range(0,1),plot_result=1)

ARMA: Wal-Mart Stock Price Prediction/Forecasting

To save space, I just include one visualization belonging to Wal-Mart. Blue line denotes forecasting, and orange line represents prediction. So, both eyeballing and RMSE tell us that ARMA works well to predict/forecast stock prices.

SARIMA is the second model that I use as a time-series forecasting tool. The parameters defined are:

number of forecast step is 30
p is in between 0 and 2
q is in between 0 and 1
d is in between 0 and 2
plot_result is 1 if you want to have it as an output.

def sarima(df,nstep, p, d, q, plot_result):
 “”” Seasonally Autoregressive Integrated Moving Average “””
 pdq = list(itertools.product(p, d, q))
 seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
 nstep = 30
 df_truth = df[split:]
 AIC_list_sarima = pd.DataFrame({}, columns=[‘pram’, ‘AIC’, ‘Pred’, ‘df_truthy_truth’, ‘rmse’, ‘forecast’])
 for param in pdq:
 for param_seasonal in seasonal_pdq:
 mod = sm.tsa.statespace.SARIMAX(df,
 order=param,
 seasonal_order=param_seasonal,
 enforce_stationarity=False,
 enforce_invertibility=False)
 results = mod.fit()
 pred = results.get_prediction(start=split, dynamic=False)
 prediction = pred.predicted_mean
 fore = results.get_forecast(steps=nstep)
 forecast = fore.predicted_mean
 rmse_sarima = math.sqrt(((prediction — df_truth) ** 2).mean())
 temp = pd.DataFrame(
 [[param, results.aic, np.array(prediction), np.array(df_truth), rmse_sarima, forecast]],
 columns=[‘pram’, ‘AIC’, ‘Pred’, ‘df_truth’, ‘rmse’, ‘forecast’])
 AIC_list_sarima = AIC_list_sarima.append(temp, ignore_index=True) # DataFrame append 
 index = int(AIC_list_sarima[[‘rmse’]].astype(float).idxmin())
 pred = AIC_list_sarima.loc[[index], [‘Pred’]]
 df_tr = AIC_list_sarima.loc[[index], [‘df_truth’]]
 forecast = AIC_list_sarima.loc[[index], [‘forecast’]]
 date_rng = pd.date_range(start=df_truth.index[-1], end=df_truth.index[-1] + timedelta(29),
 freq=’D’)
 df_tr = np.array(df_tr)[0][0]
 pred = np.array(pred)[0][0]
 forecast = np.array(forecast)[0][0]
 if plot_result:
 plt.figure(figsize=(5, 6))
 plt.plot(date_rng, forecast, label=’%s SARIMA Forecast’ %i)
 plt.plot(df_truth.index, pred, label=’%s SARIMA Prediction’%i)
 plt.plot(df_truth.index, df_tr, label=’%s SARIMA Actual’%i)
 plt.xticks(fontsize=7)
 plt.savefig(‘sarima.png’)
 plt.legend()
 plt.show() 
 print(“SARIMA RMSE: %.4f”% rmse_sarima)

To run the above-provided function, the following code is run:

for i in df.columns:
 sarima(df[str(i)],30, p=range(0,2), q=range(0,1),d=range(0,2),plot_result=1)

SARIMA: Wal-Mart Stock Price Prediction/Forecasting

LSTM is the last model in this post. As being a complex model, LSTM has much more parameters compared to ARMA and SARIMA. The parameters are:

hidden_neurons=64,
dropout_parameter=0.20
epoch=400,
batch_size=100,
plot_result=1

def my_LSTM(yt, hidden_neurons, dropout_parameter, epoch, batch_size,plot_result):
 split=int(yt.shape[0]*0.7)
 yt = yt.astype(‘float32’)
 yt = np.array(yt)
 yt = np.reshape(yt, (-1, 1))
 train = yt[:split]
 test = yt[split:]
 def prior_steps(df, look_back=30):
 X, Y = [], []
 for i in range(len(df) — look_back — 1):
 a = df[i:(i + look_back), 0]
 X.append(a)
 Y.append(df[i + look_back, 0])
 return np.array(X), np.array(Y)
 look_back = 30# 30 step ahead
 X_train, Y_train = prior_steps(train, look_back)
 X_test, Y_test = prior_steps(test, look_back)
 X_train = np.reshape(X_train, (X_train.shape[0], 1, 30))
 X_test = np.reshape(X_test, (X_test.shape[0], 1, 30))
 model = Sequential()
 model.add(LSTM(100, input_shape=(X_train.shape[1], 30),activation=’relu’,return_sequences=True))
 model.add(Dropout(0.2)) 
 model.add(Flatten())
 model.add(Dense(1))
 model.compile(loss=’mean_squared_error’, optimizer=’adam’)
 model.compile(optimizer=’adam’, loss=’mse’,metrics=[‘accuracy’])
 history = model.fit(X_train, Y_train, epochs=100, batch_size=10, validation_data=(X_test, Y_test), verbose=0,shuffle=False)
 test_pred = model.predict(X_test)
 train_pred = model.predict(X_train)
 forecastStart = X_train[X_train.shape[0] — 1]
 x_input = forecastStart
 n_features=30
 forecastStart = X_test[X_test.shape[0] — 1]
 x_input = forecastStart
 tempList = list()
 for item in range(30):
 x_input=x_input.reshape((1, 1, 30))
 yhat = model.predict(x_input, verbose=0)
 x_input = np.append(x_input, yhat)
 x_input = x_input[1:]
 tempList.append(yhat)
 date_rng = pd.date_range(start=df.index[-1], end=df.index[-1] + timedelta(29), freq=’D’)
 if plot_result:
 plt.figure(figsize=(5, 6))
 plt.plot(df[split:].index, test, label=’%s test’%i)
 plt.plot(df[split:-31].index,test_pred, label=’%s Prediction’ %i)
 plt.plot(date_rng,np.array(tempList).flatten(), label=’%s Forecast’ %i)
 plt.xticks(fontsize=7)
 plt.savefig(‘sarima.png’)
 plt.legend()
 plt.show()
 print(“LSTM RMSE: %.4f”%math.sqrt(mean_squared_error(test_pred, test[:-31])))

After finding optimum parameters or to find them, use following code to call the LSTM

for i in df.columns:
 my_LSTM(df[str(i)], hidden_neurons=64,dropout_parameter=0.20,epoch=400,batch_size=100,plot_result=1)

LSTM: Wal-Mart Stock Price Prediction/Forecasting

Object Oriented Programming for Stock Price Prediction and Forecasting

Now, it is time to combine all the model we go over so far. Python enables us to run all these three models at once using Object Oriented Programming (OOP). OOP means building applications using objects. In this environment, we write in classes and derive objects from classes. This is what OOP looks like.

class Model_Selection(object):
 def __init__(self, df, p, d, q, split, nstep,hidden_neurons, dropout_parameters, epoch, batch_size,plot_result):
 self.p = p
 self.d = d
 self.q = q
 self.df = df
 self.split = split
 self.nstep=nstep
 self.hidden_neurons = hidden_neurons
 self.dropout_parameters=dropout_parameters
 self.epoch = epoch
 self.batch_size = batch_size
 self.plot_result=plot_resultdef _arma(self, df,nstep, p, q,plot_result):
 “””Autoregressive Moving Average”””
 df_truth = df[split:]pq = list(itertools.product(p, q))
 AIC_list_arma = pd.DataFrame({}, columns=[‘pram’, ‘AIC’, ‘Pred’, ‘df_truth’, ‘rmse’, ‘forecast’])
 nstep = 30
 for param in pq:
 mod = ARMA(df, order=param)
 results = mod.fit()
 pred_arma = results.predict(start=split, dynamic=False)
 rmse_arma = math.sqrt(((pred_arma — df_truth) ** 2).mean())
 forecast = results.forecast(steps=nstep)[0]
 temp = pd.DataFrame([[param, results.aic, np.array(pred_arma), np.array(df_truth), rmse_arma, forecast]],
 columns=[‘pram’, ‘AIC’, ‘Pred’, ‘df_truth’, ‘rmse’, ‘forecast’])
 AIC_list_arma = AIC_list_arma.append(temp, ignore_index=True)index = int(AIC_list_arma[[‘rmse’]].astype(float).idxmin())
 pred = AIC_list_arma.loc[[index], [‘Pred’]]
 df_tr = AIC_list_arma.loc[[index], [‘df_truth’]]
 forecast = AIC_list_arma.loc[[index], [‘forecast’]]
 date_rng = pd.date_range(start=df_truth.index[-1], end=df_truth.index[-1] + timedelta(29),
 freq=’D’)df_tr = np.array(df_tr)[0][0]
 pred = np.array(pred)[0][0]
 forecast = np.array(forecast)[0][0]
 if plot_result:
 plt.plot(date_rng, forecast, label=’%s ARMA Forecast’%i)
 plt.plot(df_truth.index, pred, label=’%s ARMA Prediction’%i)
 plt.plot(df_truth.index, df_tr, label=’%s ARMA Actual’%i)
 plt.legend()
 plt.show()
 print (“ARMA RMSE: %.4f”% rmse_arma)
 
 def _sarima(self, df,nstep, p, d, q, plot_result):
 “”” Seasonally Autoregressive Integrated Moving Average “””
 # Generate all different combinations of p, q and q triplets
 pdq = list(itertools.product(p, d, q))# Generate all different combinations of seasonal p, q and q triplets
 seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
 nstep = 30
 df_truth = df[split:]
 AIC_list_sarima = pd.DataFrame({}, columns=[‘pram’, ‘AIC’, ‘Pred’, ‘df_truthy_truth’, ‘rmse’, ‘forecast’])
 for param in pdq:
 for param_seasonal in seasonal_pdq:
 mod = sm.tsa.statespace.SARIMAX(df,
 order=param,
 seasonal_order=param_seasonal,
 enforce_stationarity=False,
 enforce_invertibility=False)results = mod.fit()pred = results.get_prediction(start=split, dynamic=False)
 prediction = pred.predicted_mean
 fore = results.get_forecast(steps=nstep)
 forecast = fore.predicted_mean
 rmse_sarima = math.sqrt(((prediction — df_truth) ** 2).mean())temp = pd.DataFrame(
 [[param, results.aic, np.array(prediction), np.array(df_truth), rmse_sarima, forecast]],
 columns=[‘pram’, ‘AIC’, ‘Pred’, ‘df_truth’, ‘rmse’, ‘forecast’])
 AIC_list_sarima = AIC_list_sarima.append(temp, ignore_index=True) # DataFrame append
 del temp
 
 index = int(AIC_list_sarima[[‘rmse’]].astype(float).idxmin())
 pred = AIC_list_sarima.loc[[index], [‘Pred’]]
 df_tr = AIC_list_sarima.loc[[index], [‘df_truth’]]
 forecast = AIC_list_sarima.loc[[index], [‘forecast’]]
 date_rng = pd.date_range(start=df_truth.index[-1], end=df_truth.index[-1] + timedelta(29),
 freq=’D’)df_tr = np.array(df_tr)[0][0]
 pred = np.array(pred)[0][0]
 forecast = np.array(forecast)[0][0]
 if plot_result:
 plt.plot(date_rng, forecast, label=’%s SARIMA Forecast’%i)
 plt.plot(df_truth.index, pred, label=’%s SARIMA Prediction’%i)
 plt.plot(df_truth.index, df_tr, label=’%s SARIMA Actual’%i)
 plt.legend()
 plt.show()print(“SARIMA RMSE: %.4f”% rmse_sarima)def my_LSTM(self,yt,nstep, hidden_neurons, dropout_parameter, epoch, batch_size,plot_result):
 “”” Long Short Term Memory “””
 #testDate = yt[split:]
 split=int(yt.shape[0]*0.7)
 yt = yt.astype(‘float32’)
 yt = np.array(yt)
 yt = np.reshape(yt, (-1, 1))
 train = yt[:split]
 test = yt[split:]
 #train_scaled = scaler.fit_transform(train)
 #test_scaled = scaler.fit_transform(test)def prior_steps(df, look_back=30):
 X, Y = [], []
 for i in range(len(df) — look_back — 1):
 a = df[i:(i + look_back), 0]
 X.append(a)
 Y.append(df[i + look_back, 0])
 return np.array(X), np.array(Y)look_back = 30# 30 step ahead
 X_train, Y_train = prior_steps(train, look_back)
 X_test, Y_test = prior_steps(test, look_back)X_train = np.reshape(X_train, (X_train.shape[0], 1, 30))
 X_test = np.reshape(X_test, (X_test.shape[0], 1, 30))
 
 model = Sequential()
 model.add(LSTM(hidden_neurons, input_shape=(X_train.shape[1], 30),activation=’relu’,return_sequences=True))
 model.add(Dropout(0.2)) 
 model.add(Flatten())
 model.add(Dense(1))
 model.compile(loss=’mean_squared_error’, optimizer=’adam’)
 model.compile(optimizer=’adam’, loss=’mse’,metrics=[‘accuracy’])
 history = model.fit(X_train, Y_train, epochs=100, batch_size=10, validation_data=(X_test, Y_test), verbose=0,shuffle=False)
 #model.summary()
 test_pred = model.predict(X_test)
 train_pred = model.predict(X_train)
 #predictions = scaler.inverse_transform(test_pred)
 forecastStart = X_train[X_train.shape[0] — 1]
 x_input = forecastStart
 
 n_features=30
 forecastStart = X_test[X_test.shape[0] — 1]
 x_input = forecastStart
 tempList = list()for item in range(nstep):
 x_input=x_input.reshape((1, 1, 30))
 yhat = model.predict(x_input, verbose=0)
 x_input = np.append(x_input, yhat)
 x_input = x_input[1:]
#x_input = x_input.reshape((1, n_steps, n_features))
#forecast_LSTM=tempList.append(yhat)
 tempList.append(yhat)
 #forecasts = scaler.inverse_transform(np.array(tempList).flatten().reshape(-1,1)) 
 date_rng = pd.date_range(start=df.index[-1], end=df.index[-1] + timedelta(29), freq=’D’)
 
 if plot_result:
 plt.plot(df[split:].index, test, label=’%s test’ %i)
 plt.plot(df[split:-31].index,test_pred, label=’%s Prediction’%i)
 plt.plot(date_rng,np.array(tempList).flatten(), label=’%s Forecast’%i)
 plt.legend()
 plt.show()
 
 print(“LSTM RMSE: %.4f”%math.sqrt(mean_squared_error(test_pred, test[:-31])))
 
 def arma(self):
 return self._arma(self.df, self.nstep,self.p, self.q,self.plot_result)def sarima(self):
 return self._sarima(self.df,self.nstep, self.p, self.d, self.q,self.plot_result)def LSTM(self):
 return self.my_LSTM(self.df, self.nstep,self.hidden_neurons,
 self.dropout_parameters,
 self.epoch,
 self.batch_size,
 self.plot_result)
 def testAllModels(self):
 arma = self.arma()
 sarima = self.sarima()
 lstm = self.LSTM()

This code run all three models within the class environment and produce 10x3 stock price prediction and forecasting plots.

Wrap-Up

In this post, I try to introduce ARMA, SARIMA, and LSTM and corresponding Python applications. This post both adresses experts in the field and layman audience. Because, if you do not want to know how the time series models work, then you just take care of results. So, even a non-expert can run the above-given codes and replicate the result. As for data scientists, it shows you how time series models are applied and which model performs the best.

As a final word, for those who do not have or enough prior programming skills can basically change the following set of parameters along with the dataset, which is labeled as df here, and reproduce the results:

p,
d,
q,
split,
nstep,
hidden_neurons=,
dropout_parameters,
epoch,
batch_size,
plot_result

Abdullah Karasan

Stock Price Prediction and Forecasting via an Object Oriented Programming

Time series has a wide range of application areas among which finance, health, and insurance sector stand out. In…

medium.com

Follow our publication MagniData for more!
Subscribe to receive our top stories here.
Join our new Slack community: AI-ML-DataScience-Lovers

STOCK PRICE PREDICTION AND FORECASTING VIA AN OBJECT ORIENTED PROGRAMMING

Object Oriented Programming for Stock Price Prediction and Forecasting

Wrap-Up

Stock Price Prediction and Forecasting via an Object Oriented Programming

Time series has a wide range of application areas among which finance, health, and insurance sector stand out. In…

Written by Magnimind

No responses yet