Best practice 16 - save, load, and reuse models

When the machine learning is deployed, new data should go through the same data preprocessing procedures (scaling, feature engineering, feature selection, dimensionality reduction, and so on) as in previous stages. The preprocessed data is then fed in the trained model. We simply cannot rerun the entire process and retrain the model every time new data comes in. Instead, we should save the established preprocessing models and trained prediction models after the corresponding stages complete. In deployment mode, these models are loaded in advance, and they are used to produce prediction results of the new data.

We illustrate it via the diabetes example where we standardize the data and employ an SVR model:

>>> dataset = datasets.load_diabetes() 
>>> X, y = dataset.data, dataset.target 
>>> num_new = 30 # the last 30 samples as new data set 
>>> X_train = X[:-num_new, :] 
>>> y_train = y[:-num_new] 
>>> X_new = X[-num_new:, :] 
>>> y_new = y[-num_new:]

Preprocessing the training data with scaling:

>>> from sklearn.preprocessing import StandardScaler 
>>> scaler = StandardScaler()
>>> scaler.fit(X_train)

Now save the established standardize, the scaler object with pickle:

>>> import pickle
>>> pickle.dump(scaler, open("scaler.p", "wb" ))

This generates the scaler.p file. Move on with training a SVR model on the scaled data:

>>> X_scaled_train = scaler.transform(X_train) 
>>> from sklearn.svm import SVR 
>>> regressor = SVR(C=20) 
>>> regressor.fit(X_scaled_train, y_train)

Save the trained regressor, the regressor object with pickle:

>>> pickle.dump(regressor, open("regressor.p", "wb"))

This generates the regressor.p file. In the deployment stage, we first load in the saved standardizer and regressor from the two preceding files:

>>> my_scaler = pickle.load(open("scaler.p", "rb" )) 
>>> my_regressor = pickle.load(open("regressor.p", "rb"))

Then preprocess the new data using the standardizer and make a prediction with the regressor just loaded:

>>> X_scaled_new = my_scaler.transform(X_new) 
>>> predictions = my_regressor.predict(X_scaled_new)

Table of Contents for Best practice 16 - save, load, and reuse models

Create new playlist

Sign In

Sign Up

Table of Contents for
Best practice 16 - save, load, and reuse models