Stochastic gradient descent with sklearn

The sklearn library includes an SGDRegressor model in its linear_models module. To learn the parameters for the same model using this method, we need to first standardize the data because the gradient is sensitive to the scale. We use StandardScaler() for this purpose that computes the mean and the standard deviation for each input variable during the fit step, and then subtracts the mean and divides by the standard deviation during the transform step that we can conveniently conduct in a single fit_transform() command:

scaler = StandardScaler()
X_ = scaler.fit_transform(X)

Then we instantiate the SGDRegressor using the default values except for a random_state setting to facilitate replication:

sgd = SGDRegressor(loss='squared_loss', fit_intercept=True, 
shuffle=True, random_state=42, # shuffle training data for better gradient estimates
learning_rate='invscaling', # reduce learning rate over time
eta0=0.01, power_t=0.25) # parameters for learning rate path

Now we can fit the sgd model, create the in-sample predictions for both the OLS and the sgd models, and compute the root mean squared error for each:

sgd.fit(X=X_, y=y)
resids = pd.DataFrame({'sgd': y - sgd.predict(X_),
'ols': y - model.predict(sm.add_constant(X))})
resids.pow(2).sum().div(len(y)).pow(.5)

ols 50.06
sgd 50.06

As expected, both models yield the same result. We will now take on a more ambitious project using linear regression to estimate a multi-factor asset pricing model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.78.41