© Pramod Singh, Avinash Manure 2020
P. Singh, A. ManureLearn TensorFlow 2.0https://doi.org/10.1007/978-1-4842-5558-2_6

6. TensorFlow Models in Production

Pramod Singh1  and Avinash Manure2
(1)
Bangalore, Karnataka, India
(2)
Bangalore, India
 

In this final chapter of the book, you will apply what you’ve learned in previous chapters and deploy the models built in TensorFlow 2.0 in a production environment. We believe that there are two principal aspects to using machine learning. The first is being able to build any sort of machine learning model and not integrate it with any application (standalone model). The other, more impactful aspect involves taking the trained machine learning model and embedding it with an application. The second approach is where things can become complicated, compared to the first, as we have to expose the trained model end points, in order for the applications to consume and use their predictions for activation or any other purpose. This chapter introduces some of the techniques by which we can deploy a machine learning model. We are not going to build a full-blown TensorFlow-based application. Rather, we will go over different frameworks to save a model, reload it for prediction, and deploy it. In the first part of the chapter, we review the internals of model deployment and their challenges. The second part demonstrates how to deploy a Python-based machine language model, using Flask (web framework). In the chapter’s final section, we discuss the process of building a TensorFlow 2.0–based model.

Model Deployment

The sad reality: the most common way Machine Learning gets deployed today is PowerPoint slides.

“Deploying Machine Learning at Scale,” Algorithmia, https://info.algorithmia.com/deploying-machine-learning-at-scale-1, May 29, 2018.

According to a survey, less than 5% of commercial data science projects make it to production. For readers who have never undertaken any sort of software or machine learning deployment before, let us explain a few fundamental features of model deployment. It is more pertinent to the scalability aspect of an application, which can serve a bigger number of requests. For example, most anyone can cook at home for themselves or family members. On the other hand, it takes a different set of requirements, skills, and resources to successfully cook for a restaurant or online food service. The former can be done easily enough, whereas the latter might require a lot of planning, implementation, and testing before operating smoothly. Model deployment is similar. In scenarios in which a machine learning model has to be deployed within an application system, integration and maintenance become critical components. The successful deployment of a model takes a lot of planning and testing before an application platform matures to a level of self-sustaining prediction.

There is little doubt or argument regarding the fact that the true value of machine learning can only be unlocked or gained when it’s deployed in an application or system. Without deployment, machine learning offers limited success and impact in today’s business world. Deployment provides an exciting dimension to machine learning capability. Assuming that we have a fair understanding of a machine learning model, we can safely move on to its deployment aspect. To set the right expectations, let us make a bold statement at the outset. Machine learning is relatively easy compared to its deployment. The reason is that deployment brings a set of other parameters that must be taken into account, in order to build an end-to-end machine learning–based application, which is not always easy to carry out. Therefore, let’s go over some of the challenges one might face while deploying a model into an application or system.

Isolation

Machine learning models can be built in isolation. In fact, all that we require to build a machine learning model is reasonably sized training data. However, deployment of a machine learning model doesn’t work in isolation. Figure 6-1 (taken from Sculley et al., “Hidden Technical Debt in Machine Learning Systems,” 2015) depicts the challenges that come with machine learning model deployment. In reality, a machine learning model code seems to be a very small component in the overall setup. It is the rest of the elements that demand consistent engagement and communication with the machine learning model.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig1_HTML.jpg
Figure 6-1

Application management

Collaboration

Most of us are aware that it’s teams who build products or execute projects. Therefore, it takes a lot of collaboration and engagement to build or deploy a successful product. It’s no different in the machine learning world, where application developers might have to coordinate with data scientists, to deploy a model in a system. Issues arise, for example, when the model is built in one language, and DevOps or applications folks are using some other language.

Model Updates

Things around us hardly remain the same. However, a few things are changing so rapidly that technology has not been able to keep pace with the changing behaviors of users. Similarly, machine learning models must also be regularly updated, in order to remain relevant and highly efficient. This is easier to ensure with a standalone model, but it requires a lot of steps to update a model live in production.

Model Performance

The whole idea of using machine learning in applications is to be able to generalize well and help customers make suitable choices. This all depends on the performance underneath the model. Therefore, the tracking and monitoring of models in production become a critical part of the overall application.

Load Balancer

The final challenge in model deployment is the ability to handle requests at scale. Every application or platform should be designed in such a way that it can work seamlessly in high-traffic situations.

Now that we have reviewed the challenges faced in model deployment, we can go over some of the basic-to-intermediate steps to deploy Python-based models. Again, the focus of this chapter is to expose some of the available tools and techniques to deploy machine learning models, instead of building a full application.

Python-Based Model Deployment

There are multiple ways in which a machine learning model can be deployed in production. All depend on the requirement and load that is expected to be served by the model. In this section, we will go over a couple of approaches, to see how we can create, save, and restore a Python-based machine learning model for making predictions. We then move on to deploying TensorFlow-based models in production, in the last section.

Saving and Restoring a Machine Learning Model

At the end of the day, a machine learning model is simply a combination of a few scores, respective to every input feature used while training the model, which describes the relationship between given inputs and output in the best possible way. The ability to save any machine learning model (irrespective of being built in Python, R, or TensorFlow) allows us to use it later, at any point in time, for making predictions on new data, as well as to share it with other users. Saving any model is also known as serialization. This can also be done in different ways, as Python has its own way of persisting a model, known as pickle. Pickle can be used to serialize machine language models, as well as any other transformer. The other approach has the built-in functionality of sklearn, which allows saving and restoring of Python-based machine learning models. In this section, we will focus on using the joblib function to save and persist sklearn models. Once the model is saved on disk or at any other location, we can reload or restore it back, for making predictions on new data.

In the example below, we consider the standard data set for building a linear regression model. The input data has five input columns and one output column. All the variables are numeric in nature, so little feature engineering is required. Nevertheless, the idea here is not to focus on building a perfect model but to build a baseline model, save it, and then restore it. In the first step, we load the data and create input and output feature variables (X,y).
[In]: import pandas as pd
[In]: import numpy as np
[In]: from sklearn.linear_model import LinearRegression
[In]: df=pd.read_csv('Linear_regression_dataset.csv',header='infer')
[In]: df
[Out]:
../images/489297_1_En_6_Chapter/489297_1_En_6_Figa_HTML.jpg
[In]: X=df.loc[:,df.columns !='output']
[In]: y=df['output']
The next step is to split the data into train and test sets. Then we build the linear regression model on the training data and access the coefficient values for all the input variables.
[In]: from sklearn.model_selection import train_test_split
[In]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
[In]: lr = LinearRegression().fit(X_train, y_train)
[In]: lr.coef_
[Out]: array([[ 3.40323422e-04,  5.78491342e-05,  2.24450972e-04,
       -6.65195539e-01,  5.01534474e-01]])
The performance of this baseline model seems reasonable, with an R-squared value of 87% on the training set and 85% on the test set.
[In]: model.score(X_train,y_train)
[Out]: 0.8735114024937244
[In]: model.score(X_test,y_test)
[Out]: 0.8551517840207584
Now that we have the trained model available, we can save it at any location or disk, using joblib or pickle. We name the exported model linear_regression_model.pkl.
[In]: import joblib
[In]: joblib.dump(lr,'linear_regression_model.pkl')
Now, we create a random input feature set and predict the output, using the trained model that we just saved.
[In]: test_data=[600,588,90,0.358,0.333]
[In]: pred_arr=np.array(test_data)
[In]: print(pred_arr)
[Out]: [6.00e+02 5.88e+02 9.00e+01 3.58e-01 3.33e-01]
[In]: preds=pred_arr.reshape(1,-1)
[In]: print(preds)
[Out]: [[6.00e+02 5.88e+02 9.00e+01 3.58e-01 3.33e-01]]
In order to predict the output with the same model, we first must import or load the saved model, using joblib.load. Once the model is loaded, we can simply use the predict function, to make the prediction on a new data point.
[In]: model=open("linear_regression_model.pkl","rb")
[In]: lr_model=joblib.load(model)
[In]: model_prediction=lr_model.predict(preds)
[In]: print(model_prediction)
[Out]: [0.36901871]
This was clearly done from a local disk space and not any cloud location, but to an extent, this approach would still work in production, as the pickled file of the model can be saved at a location in the production environment. For a couple of reasons, this is not the ideal way to deploy your model in production.
  1. 1.

    Limited access. Only users who have access to the production environment can use the machine learning model, as it is restricted to a particular environment.

     
  2. 2.

    Scalability. Having just a single instance of model prediction can result in serious challenges, once the load or demand for the output increases.

     

Deploying a Machine Learning Model As a REST Service

To overcome the limitations mentioned previously, we can deploy the model as a REST (representational state transfer) service, in order to expose it to external users. This allows them to use the model output or prediction without having to access the underlying model. In this section, we will make use of Flask to deploy the model as a REST service. Flask is a lightweight web framework, built in Python, to deploy applications on a server. This book does not cover Flask in detail, but for those readers who have never used it, the following code snippet offers a brief introduction.

We create a simple .py file and write the subsequent lines of code, in order to run a simple Flask-based app. We first import Flask and create the Flask app. Then we decorate our main function, which is a simple Hello World!, with app.route, which gives the path for accessing the app (a simple /, in this case). The last step is to run the app, by calling the mains file.
[In]: pip install Flask
[In]: from flask import Flask
[In]: app = Flask(__name__)
[In]: @app.route("/")
[In]: def hello():
      return "Hello World!"
[In]: if __name__ == '__main__':
        app.run(debug=True)

We can now go to localhost:5000 and witness the Flask server running and showing “Hello World!”

Next, we are going to use the model that we built earlier and deploy it, using the Flask server. In order to do this, we must create a new folder (web_app) and save the model.pkl file. We are going to use the same model that we built in the preceding section. We can either move the model.pkl file manually to the web_app folder or re-save the model, using the earlier script in a new location, as shown following:
[In]: joblib.dump(lr,'web_app/linear_regression_model.pkl')
Let’s begin to create the main app.py file, which will spin up the Flask server to run the app.
[In]: import pandas as pd
[In]: import numpy as np
[In]: import sklearn
[In]: import joblib
[In]: from flask import Flask,render_template,request
[In]: app=Flask(__name__)
[In]: @app.route('/')
[In]: def home():
             return render_template('home.html')
[In]: @app.route('/predict',methods=['GET','POST'])
[In]: def predict():
      if request.method =='POST':
             print(request.form.get('var_1'))
             print(request.form.get('var_2'))
             print(request.form.get('var_3'))
             print(request.form.get('var_4'))
             print(request.form.get('var_5'))
             try:
                   var_1=float(request.form['var_1'])
                   var_2=float(request.form['var_2'])
                   var_3=float(request.form['var_3'])
                   var_4=float(request.form['var_4'])
                   var_5=float(request.form['var_5'])
                   pred_args=[var_1,var_2,var_3,var_4,var_5]
                   pred_arr=np.array(pred_args)
                   preds=pred_arr.reshape(1,-1)
                   model=open("linear_regression_model.pkl","rb")
                   lr_model=joblib.load(model)
                   model_prediction=lr_model.predict(preds)
                   model_prediction=round(float(model_prediction),2)
             except ValueError:
                   return "Please Enter valid values"
      return render_template('predict.html',prediction=model_prediction)
[In]: if __name__=='__main__':
             app.run(host='0.0.0.0')
Let’s go over the steps, in order to understand the details of the app.py file. First, we import all the required libraries from Python. Next, we create our first function, which is the home page that renders the HTML template to allow users to fill input values. The next function is to publish the predictions by the model on those input values provided by the user. We save the input values into five different variables coming from the user and create a list (pred_args). We then convert that into a numpy array. We reshape it into the desired form, to be able to make predictions in the same way. The next step is to load the trained model (linear_regression_model.pkl) and make the predictions. We save the final output into a variable (model_prediction). We then publish these results via another HTML template (predict.html). If we run the main file (app.py) now in the terminal, we will see the page shown in Figure 6-2, asking the user to fill the values. The output is shown in Figure 6-3.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig2_HTML.jpg
Figure 6-2.

Inputs to the model

../images/489297_1_En_6_Chapter/489297_1_En_6_Fig3_HTML.jpg
Figure 6-3

Prediction output

Templates

There are two web pages that we have to design, in order to post requests to the server and receive in return the response message, which is the prediction by the machine learning model for that particular request. Because this book doesn’t focus on HTML, you can simply use these files as they are, without making any changes to them. But for curious readers, we are creating a form to request five values in five different variables. We are using a standard CSS template with very basic fields (Figure 6-4). Users with prior knowledge of HTML can feel free to redesign the home page per their requirements (Figure 6-5).
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig4_HTML.jpg
Figure 6-4

User input’s HTML

../images/489297_1_En_6_Chapter/489297_1_En_6_Fig5_HTML.jpg
Figure 6-5

Input web page

The next template is to publish the model prediction back to the user (Figure 6-6). It is less complicated, compared to the first template, as there is just one value that we have to post back to the user (Figure 6-7).
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig6_HTML.jpg
Figure 6-6

Model’s output HTML

../images/489297_1_En_6_Chapter/489297_1_En_6_Fig7_HTML.jpg
Figure 6-7

Model’s output

Now that we have seen how to deploy a model, using a web framework, we can move on to the last section of this chapter, which focuses on deploying a TensorFlow 2.0 model. There are two parts to this section. In the first, we will build a standard deep learning network, using tf.keras to classify images. Once the neural network is trained, we will save it and load it back, to make predictions on test data. In the second part of this section, we will go over the process to deploy the model, using the TensorFlow server platform.

Challenges of Using Flask

Although Flask is fine for deploying models as a service, it hits a roadblock when an application has numerous users. For a small-scale application, Flask can do a good job and manage the load. The alternative to Flask can be to use containers, such as Docker. For readers who have never used Docker, it is simply a technique to containerize the application, to run it irrespective of the platform. It resolves all the application dependency issues and runs much faster and easier, compared to a manual approach. Today, the common process to deploy any application in production is to containerize it, using Docker, and run it as a service on top of Kubernetes or any other cloud platform. One of challenges is to handle the number of requests made of the application. Therefore, Docker and Kubernetes can manage any number of increased requests, by managing via a built-in load balancer. This reduces the number of containers, if requests are fewer, and runs another instance of the applications, if load increases. In the next section, we are going to see how we can build a TensorFlow model and reload it for prediction in TensorFlow.

Building a Keras TensorFlow-Based Model

The data set that we are going to use to build this deep neural network is the standard Fashion-MNIST set we used previously. We start by importing the required libraries and ensuring that we have the latest version of TensorFlow.
[In]: import tensorflow as tf
[In]: tf.__version__
[Out]: '2.0.0-rc0'
[In]: from tensorflow import keras
[In]: import matplotlib.pyplot as plt
[In]: import numpy as np
[In]: from keras.preprocessing import image
The next step is to load the data set and divide it into training and test sets. We have 60,000 images in the training set on which we are going to train the network. Before training the model, we must execute a couple of steps.
  1. 1.

    Label the target classes, so as to recognize the image better.

     
  2. 2.
    Standardizing the size of each image.
    [In]: df = keras.datasets.fashion_mnist
    [In]: (X_train, y_train), (X_test, y_test) = df.load_data()
    [In]: X_train.shape
    [Out]: (60000, 28, 28)
    [In]: y_train.shape
    [Out]: (60000,)
    [In]: labels=['T-shirts','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot']
    [In]: X_train=X_train[:50000]
    [In]: X_val=X_train[50000:]
    [In]: y_train=y_train[:50000]
    [In]: y_val=y_train[50000:]
    [In]: X_train=X_train/255
    [In]: X_val=X_val/255
     
To see a sample image, we can use the imshow function and pass a particular image, as shown in a couple of examples following:
[In]: plt.imshow(X_train[100])
[Out]:
../images/489297_1_En_6_Chapter/489297_1_En_6_Figb_HTML.jpg
[In]: print(labels[y_train[100]])
[Out]: Bag
[In]: plt.imshow(X_train[1055])
[Out]:
../images/489297_1_En_6_Chapter/489297_1_En_6_Figc_HTML.jpg
[In]: print(labels[y_train[1055]])
[Out]: Sneaker
The next step is to actually define and build the model. We use a conventional sequential model with three layers, the first containing 200 units, the second 100, and the last containing the prediction layer with 10 units of neurons.
[In]: keras_model = keras.models.Sequential()
[In]: keras_model.add(keras.layers.Flatten(input_shape=[28, 28]))
[In]: keras_model.add(keras.layers.Dense(200, activation="relu"))
[In]: keras_model.add(keras.layers.Dense(100, activation="relu"))
[In]: keras_model.add(keras.layers.Dense(10, activation="softmax"))
[In]: keras_model.compile(optimizer="sgd",loss=keras.losses.sparse_categorical_crossentropy,metrics=["accuracy"])
We now train the model on the training set and set the number of epochs to 10.
[In]: history = keras_model.fit(X_train, y_train,epochs=10)
[Out]:
../images/489297_1_En_6_Chapter/489297_1_En_6_Figd_HTML.jpg
Once the model is trained, we can test its accuracy on the test data. It appears to be close to 85%. We can definitely improve the model, by making changes in the network or using a CNN (convolutional neural network) that is more suitable for image classification, but the idea of this exercise is to save a model and call it later for predictions.
[In]: X_test=X_test/255
[In]: test_accuracy=keras_model.evaluate(X_test,y_test)
[Out]: 0.8498
Now, we save the model as a Keras model and load it back, using load_model for prediction.
[In]: keras_model.save("keras_model.h5")
[In]: loaded_model = keras.models.load_model("keras_model.h5")
In the following example, we load a test image (100), which is a dress, and then we will use our saved model to make a prediction about this image.
[In]: plt.imshow(X_test[100])
[In]: print(labels[y_test[100]])
[Out]:
../images/489297_1_En_6_Chapter/489297_1_En_6_Fige_HTML.jpg
We create a new variable (new_image) and reshape it into the desired form for model prediction. The model correctly classifies the image as “Dress.”
[In]: new_image= X_test[100]
[In]: new_image = image.img_to_array(new_image)
[In]: new_image = np.expand_dims(new_image, axis=0)
[In]: new_image = new_image.reshape(1,28,28)
[In]: prediction=labels[loaded_model.predict_classes(new_image)[0]]
[In]: print(prediction)
[Out]: Dress
One more example: We can select another image (500) and make a prediction using the saved model.
[In]: plt.imshow(X_test[500])
[In]: print(labels[y_test[500]])
[Out]:
../images/489297_1_En_6_Chapter/489297_1_En_6_Figf_HTML.jpg
[In]: new_image= X_test[500]
[In]: new_image = image.img_to_array(new_image)
[In]: new_image = np.expand_dims(new_image, axis=0)
[In]: new_image = new_image.reshape(1,28,28)
[In]: prediction=labels[loaded_model.predict_classes(new_image)[0]]
[In]: print(prediction)
[Out]: Pullover

TF ind deployment

Another way of productionizing the machine learning model is to use the Kubeflow platform. Kubeflow is a native tool for managing and deploying machine learning models on Kubernetes. Because Kubernetes is beyond the scope of this book, we will not delve too deeply into its details. However, Kubernetes can be defined as a container orchestration platform that allows for the running, deployment, and management of containerized applications (machine learning models, in our case).

In this section, we will replicate the same model that we built previously and run it in the cloud (via Google Cloud Platform), using Kubeflow. We will also use the Kubeflow UI, to navigate and run Jupyter Notebook in the cloud. Because we are going to use Google Cloud Platform (GCP), we must have a Google account, so that we can avail ourselves of the free credits provided by Google for the use of GCP components. Go to https://console.cloud.google.com/ and create a Google user account, if you do not have one already. You will be required to provide a few additional details, along with credit card information, as shown in Figure 6-8.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig8_HTML.jpg
Figure 6-8

Google user account

Once we log in to the Google console, there are many options to explore, but first, we must enable the free credits provided by Google, in order to access the cloud services for free (up to $300). Next, we must create a new project or select one of the existing projects, for users already in possession of a Google account, as shown in Figure 6-9.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig9_HTML.jpg
Figure 6-9

Google project

To use Kubeflow, the final step is to enable Kubernetes Engine APIs. In order to enable Kubernetes Engine APIs, we must go to the APIs & Services dashboard (Figure 6-10) and search for Kubernetes Engine API. Once this shows up in the library, we must enable it, as shown in Figure 6-11.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig10_HTML.jpg
Figure 6-10

APIs dashboard

../images/489297_1_En_6_Chapter/489297_1_En_6_Fig11_HTML.jpg
Figure 6-11

Enabling Kubernetes APIs

The next step is to deploy the Kubernetes cluster on GCP, using Kubeflow. There are multiple ways of doing this, but we are going to deploy the cluster by using a UI. Go to https://deploy.kubeflow.cloud/#/ and provide the required details, as shown in Figure 6-12.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig12_HTML.jpg
Figure 6-12

Kubeflow deployment

We must enter the project ID (to view the project details under the Project tab on the GCP console), the deployment name of choice, and select the option to log in with a username and password, to keep things simple. Next, we again enter our username and password of choice (we will need them again to log in to the Kubeflow UI). We can select the Google Kubernetes Engine zone again, depending on what zone is available, and choose Kubeflow version 0.62. Clicking Create Deployment ensures that all required resources will be up and running in about 30 minutes. We can also check if the Kubernetes cluster is up and running by going back to the Google console dashboard and selecting the Kubernetes Engine and Clusters option. It might take a few minutes before we can see a Kubernetes Engine cluster up and running. Now that the Kubeflow deployment is set up, we can simply click the Kubeflow Service Endpoint button, and a new UI page will be available. We must use the same username and password that we provided during the deployment phase, as shown in Figure 6-13.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig13_HTML.jpg
Figure 6-13

Kubeflow login

Once we log in to the Kubeflow UI, we can see the Kubeflow dashboard, with its multiple options, such as Pipelines, Notebook Servers, etc., as shown in Figure 6-14.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig14_HTML.jpg
Figure 6-14

Kubeflow dashboard

We must select Notebook Servers, to start a new notebook server. For a new notebook server, we must provide a few details regarding the desired configuration, as shown in Figure 6-15.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig15_HTML.jpg
Figure 6-15

Kubeflow Notebook Servers

Now we must provide a few configuration details to spin up the server, such as base image (with pre-installed libraries and dependencies), the size of CPU/GPUs, and total memory (5 CPUs and 5GB memory suffices for our model). We can select the image with TensorFlow version 2.0, because we are building the model with that version. We must also add GCP credentials, in case we want to save the model to GCP’s storage bucket and use it for serving purposes. After a while, the notebook server will be up and running, and we can click Connect, to open the Jupyter Notebook running on the Kubeflow server, as shown in Figure 6-16.
../images/489297_1_En_6_Chapter/489297_1_En_6_Fig16_HTML.jpg
Figure 6-16

Opening the Jupyter Notebook server from Notebook Servers

Once Jupyter Notebook is up, we can select the option to create a new Python 3 notebook or simply go to its terminal and clone the required repo from Git, to download all the model files to this notebook. In our case, because we are building the model from scratch, we will create a new Python 3 notebook and replicate the same model built earlier in the chapter. It should work exactly as before, the only difference being that we are now using Kubeflow to build and serve the model. In case any library is not available, we can simply pip3 install the library and use it in this notebook.

Once the model is built and we have used the services of Kubeflow, we must terminate and delete all the resources, in order to avoid any extra cost. We must go back to the Google console and, under the Kubernetes clusters list, delete the Kubeflow server.

Conclusion

In this chapter, we explored the common challenges faced when taking machine learning models into production and how to overcome them. We also reviewed the process for saving a machine learning model (Python- and TensorFlow-based) and deploying it into production, using different frameworks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.240.178