Best practice 19 – saving, loading, and reusing models

When machine learning is deployed, new data should go through the same data preprocessing procedures (scaling, feature engineering, feature selection, dimensionality reduction, and so on) as in previous stages. The preprocessed data is then fed in the trained model. We simply cannot rerun the entire process and retrain the model every time new data comes in. Instead, we should save the established preprocessing models and trained prediction models after corresponding stages have been completed. In deployment mode, these models are loaded in advance, and are used to produce the prediction results of the new data.

We illustrate it via the diabetes example where we standardize the data and employ an SVR model, as follows:

>>> dataset = datasets.load_diabetes()
>>> X, y = dataset.data, dataset.target
>>> num_new = 30 # the last 30 samples as new data set
>>> X_train = X[:-num_new, :]
>>> y_train = y[:-num_new]
>>> X_new = X[-num_new:, :]
>>> y_new = y[-num_new:]

Preprocess the training data with scaling, as shown in the following commands:

>>> from sklearn.preprocessing import StandardScaler
>>> scaler = StandardScaler()
>>> scaler.fit(X_train)

Now save the established standardizer, the scaler object with pickle, as follows:

>>> import pickle
>>> pickle.dump(scaler, open("scaler.p", "wb" ))

This generates the scaler.p file.

Move on with training a SVR model on the scaled data, as follows:

>>> X_scaled_train = scaler.transform(X_train)
>>> from sklearn.svm import SVR
>>> regressor = SVR(C=20)
>>> regressor.fit(X_scaled_train, y_train)

Save the trained regressor object with pickle, as follows:

>>> pickle.dump(regressor, open("regressor.p", "wb"))

This generates the regressor.p file.

In the deployment stage, we first load the saved standardizer and regressor object from the preceding two files, as follows:

>>> my_scaler = pickle.load(open("scaler.p", "rb" ))
>>> my_regressor = pickle.load(open("regressor.p", "rb"))

Then preprocess the new data using the standardizer and make prediction with the regressor object just loaded, as follows:

>>> X_scaled_new = my_scaler.transform(X_new)
>>> predictions = my_regressor.predict(X_scaled_new)

We also demonstrate how to save and restore models in TensorFlow as a bonus session. As an example, we train a simple logistic regression model on the cancer dataset, as follows:

>>> import tensorflow as tf
>>> from sklearn import datasets
>>> cancer_data = datasets.load_breast_cancer()
>>> X = cancer_data.data
>>> Y = cancer_data.target
>>> n_features = int(X.shape[1])
>>> learning_rate = 0.005
>>> n_iter = 200
>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.float32, shape=[None])
>>> W = tf.Variable(tf.zeros([n_features, 1]), name='W')
>>> b = tf.Variable(tf.zeros([1]), name='b')
>>> logits = tf.add(tf.matmul(x, W), b)[:, 0]
>>> cost = tf.reduce_mean(
         tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits))
>>> optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
>>> sess = tf.Session()
>>> sess.run(tf.global_variables_initializer())
>>> for i in range(1, n_iter+1):
...     _, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})
...     if i % 10 == 0:
...         print('Iteration %i, training loss: %f' % (i, c))
Iteration 10, training loss: 0.744104
Iteration 20, training loss: 0.299996
Iteration 30, training loss: 0.278439
...
...
...
Iteration 180, training loss: 0.189589
Iteration 190, training loss: 0.186912
Iteration 200, training loss: 0.184381

Hopefully, these all look familiar to you. If not, feel free to review our TensorFlow implementation of logistic regression in Chapter 7, Predicting Online Ads Click-through with Logistic Regression. Now here comes the model saving part. Let's see how it is done by performing the following steps:

First we create a saver object in TensorFlow, as follows:

>>> saver = tf.train.Saver()

Save the model (or more specifically, the weight and bias variables) in a local file, as follows:

>>> file_path = './model_tf'
>>> saved_path = saver.save(sess, file_path)
>>> print('model saved in path: {}'.format(saved_path))
model saved in path: ./model_tf

Then we can restore the saved model. Before that, let's delete the current graph so it is more clear that we are actually loading a model from a file, as follows:

>>> tf.reset_default_graph()

Now we import the graph and see all tensors in the graph, as follows:

>>> imported_graph = tf.train.import_meta_graph(file_path+'.meta')

Finally, run a session and restore the model, as follows:

>>> with tf.Session() as sess:
...     imported_graph.restore(sess, file_path)
...     W_loaded, b_loaded = sess.run(['W:0','b:0'])
...     print('Saved W = ', W_loaded)
...     print('Saved b = ', b_loaded)
Saved W = [[ 7.76923299e-02]
 [ 1.78780090e-02]
 [ 6.56032786e-02]
 [ 1.02017745e-02]
...
...
...
 [-2.42149338e-01]
 [ 1.18054114e-02]
 [-1.14070164e-04]]
Saved b = [0.13216525]

We print out the weight and bias of the trained and saved model.

Table of Contents for Best practice 19&#xA0;&#x2013; saving, loading, and reusing models

Create new playlist

Sign In

Sign Up

Table of Contents for
Best practice 19 – saving, loading, and reusing models