Building the regression model

Now that we have a baseline to compare with, let's build our first regression model. We're going to start with a very basic model using only the stock's prior closing values to predict the next day's close, and we're going to build it using a support vector regression. With that, let's set up our model:

  1. The first step is to set up a DataFrame that contains a price history for each day. We're going to include the past 20 closes in our model:
for i in range(1, 21, 1): 
    sp.loc[:,'Close Minus ' + str(i)] = sp['Close'].shift(i) 
sp20 = sp[[x for x in sp.columns if 'Close Minus' in x or x == 'Close']].iloc[20:,] 
  1. This code gives us each day's closing price, along with the previous 20, all on the same line. The result of our code is seen in the following output:

  1. This will form the basis of the X array we will feed our model. But before we're ready for that, there are a few additional steps.
  2. First, we'll reverse our columns so that time runs from left to right:
sp20 = sp20.iloc[:,::-1] 

This generates the following output:

  1. Now, let's import our support vector machine and set our our training and test matrices and vectors:
from sklearn.svm import SVR 
clf = SVR(kernel='linear') 
X_train = sp20[:-2000] 
y_train = sp20['Close'].shift(-1)[:-2000] 
X_test = sp20[-2000:] 
y_test = sp20['Close'].shift(-1)[-2000:] 
  1. We had just 5,000 data points to work with, so I chose to use the last 2,000 for testing. Let's now fit our model and use it to check out-of-sample data:
model =, y_train) 
preds = model.predict(X_test) 
  1. Now that we have our predictions, let's compare them to our actual data:
tf = pd.DataFrame(list(zip(y_test, preds)), columns=['Next Day Close', 'Predicted Next Close'], index=y_test.index) 

The preceding code generates the following output:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.