In this chapter, we’ll tie a lot of things together that we have covered in the book. The end goal of an AI application should always be to bring business value. That means all the great models that are created by data scientists should be put into production. It’s quite difficult to productionize and maintain an application that contains one or more machine learning models. In this chapter, we’ll discuss three major options for productionizing an AI application, so you’ll be well equipped to pick a method for your situation.
By the end of this chapter, you will be able to describe the steps it takes to run a machine learning model in production, list a few common options to run models in production, and design and implement a continuous delivery pipeline for models.
In the previous chapters, you have learned ways to set up a data storage environment for AI. In this chapter, we will explore the final step: taking machine learning models into production, so that they can be used in live business applications. There are several methods for productionizing models, and we will elaborate on a few common ones.
Data scientists are trained to wrangle data, pick a machine learning algorithm, do feature engineering, and optimize the models they create. But even an excellent model has no value if it only runs in a machine learning environment or on the laptop of the data scientist; it has to be deployed in a production application. Furthermore, models have to be regularly updated to reflect the latest feedback from customers. Ideally, a model is continuously and automatically refreshed in a feedback loop; we call that reinforcement learning. An example of a system that uses reinforcement learning is a recommendation engine on a video website. Every time the algorithm makes a recommendation to a customer to view a movie, it tracks whether the recommendation is followed. If that is the case, the connection between the customer profile (the features of the algorithm) and the recommendation becomes stronger (reinforced), making a similar recommendation more likely in the future.
At this moment, it’s important to realize that models don’t have to run in an API, a data stream, or in an interactive way at all. It’s perfectly acceptable to run a model periodically (say, once per day or even once per month) on a dataset, and write the outcomes to a lookup table. In this way, data is preselected to be used by a live application. A production system only has to find the right records with a simple query on the lookup table. However, more and more systems rely on model execution on demand, simply because the results have to be updated in real-time.
In this chapter, we will describe a few options for running models in production. First, we’ll look at ways to export a model from a machine learning environment to run it in production. A common approach is to serialize a model into an intermediate format such as pickle and load the resulting files in an API. In this way, your model functions as the core functionality in a microservice.
Next, we’ll look at the most popular framework for running containers: Docker. Based on this framework, all public cloud providers offer scalable services that allow customers to run their containers with ease. Kubernetes, OpenShift, and Docker Swarm are examples of services that make use of Docker images.
Finally, we’ll describe a method to run models in a streaming data environment. Since performance is an important requirement for streaming data engines, we have to keep the latency low by making our model execution as effective as possible. This can be done by loading the models into an in-memory cache with an intermediate format such as PMML.
Let’s start with a basic form of serving models in production: creating an API. We will use the popular frameworks pickle and Flask for this.
A machine learning model can “live” in (be part of) many different environments. The choice of environment should depend on the type of application that is being developed, the performance requirements, and the expected frequency of updates. For example, a model that has to predict the weather once per day for a weather analyst has different requirements than a model that makes friend suggestions for millions of people on a social network.
For extreme cases, there are specialized techniques such as streaming models. We’ll have a look at them later in this chapter. For now, we’ll focus on a method that works for most use cases: running a model as part of an API. In doing so, our model can be part of a microservices architecture, which gives a lot of flexibility and scalability. To build such an API, pickle and joblib are two popular libraries that can be used when working with Python models. They offer the possibility to capture a dataset or a model that was trained in memory, thereby preparing it for transportation to a different environment. As such, this is a good way to share the same model in the development, testing, and production environments. There are also some disadvantages. pickle and joblib are not language-neutral since they can only be used in Python environments. This gives both data scientists and data engineers a disadvantage since it would be better to have a wider set of technologies to choose from. If you require a cross-platform framework, you could be better off with (for example) Express.js, Spring Boot, or FastAPI for API development and PMML or PFA for model serialization. However, if you’re sure that both the machine learning environment and the production environment are running Python, pickle and joblib are good options for serializing your models. They are well documented, easy to use, high performing, and have a large customer base and community behind them.
We will focus in the first exercise of this chapter on the pickle framework. You will learn how to serialize (or pickle) a simple model, and how to unserialize or marshal (unpickle) it. Serializing is the process of exporting a model that is built in a notebook. The model lives in the memory of a server in a file format such as JSON, XML, or a binary format. A serialized model can be treated like any other asset in an application; this is needed to transport the model, create versions of it, and deploy it. Then, you will use the model in a Flask API to execute it and get predictions from an input dataset. Let’s implement this in the next exercise.
In this exercise, we’re going to train a simple model and expose it as an API. This exercise aims to create a working API that can be called to get a prediction from a machine learning model. We’ll use a dataset of Titanic passengers to build a model that predicts whether a person could have survived the disaster of 15 April 1912. We’ll use the pickle framework to serialize and deserialize the model, and Flask to expose the API.
pickle is part of the standard Python 3 library, so no installation is needed if you have Python 3 running. For Flask, we’ll install it first with pip within the exercise.
We will be using a sample dataset that is based on the Titanic dataset. In this famous dataset, all passengers of the first and final trip of the Titanic are listed. The dataset includes details about the persons, such as their family situation during the boat trip, the price they paid for a ticket, and whether they survived the disaster. Predictions can be made such as who is most likely to have survived. The dataset can be found in our GitHub repository at the following location:
You need to copy the Titanic folder from the GitHub repository.
We’ll do this exercise in two parts. The first part consists of building a model and exporting it. In the second part, we’ll load the model into an API to get predictions from it.
Perform the following steps to complete the exercise:
!pip install pandas
!pip install sklearn
It should give the following output:
It will download the libraries and install them within your active Anaconda environment. There is a good chance that both frameworks are already available in your system, as part of Anaconda or from previous installations. If they are already installed, you will have the following output:
import pickle
import pandas as pd
from sklearn.linear_model import LogisticRegression
# load the training dataset
train = pd.read_csv(‘../../Datasets/Titanic/train.csv’)
What you can see in this output is that the train object now holds a dataset of 891 rows, with 12 columns. The names and datatypes of the columns are specified; we can see, for example, that the first column is called PassengerId and is of type int64. Take a note of the columns that are of type object. These are difficult to work within a machine learning model; we should convert them into a numerical datatype.
# count the survivors
train[train[‘Survived’] == 0]
This will produce the following output, showing that there were 549 survivors:
# load the testing dataset
test = pd.read_csv(‘../../Datasets/Titanic/test.csv’)
The output shows that there are 418 records in our testing dataset, with the same columns as in the training dataset.
# prepare the dataset
train.Sex = train.Sex.map({‘male’:0, ‘female’:1})
We have now transformed the values in the Sex column to either 0 (for male) or 1 (for female).
# use the values in the Survived column as output targets
y = train.Survived.copy()
X = train.drop([‘Survived’], axis=1)
Since the Survived column contains our output value on which we have to train our model (the target values), we have to extract that from the dataset. We create a new dataset for it called y and then remove the column from the training dataset. We call the new training set X.
Now, let’s do some feature engineering. We can be quite certain that a lot of the columns will not hold any predictive value as to whether a person survived. For example, the name of someone and their passenger ID are interchangeable and will not contribute much to the predictive power of the machine learning model.
X.drop([‘Name’], axis=1, inplace=True)
X.drop([‘Embarked’], axis=1, inplace=True)
X.drop([‘PassengerId’], axis=1, inplace=True)
X.drop([‘Cabin’], axis=1, inplace=True)
X.drop([‘Ticket’], axis=1, inplace=True)
We have removed the Name, Embarked, PassengerId, Cabin, and Ticket columns. There is one more thing to do: the Age column contains some empty (null) values. These could get in the way when training the model.
X.Age.fillna(X.Age.mean(), inplace=True)
X.info()
X.head()
This should give the following output:
As becomes clear from this output, the training dataset still holds all the 891 rows but there are fewer columns. Moreover, all columns are of a numerical type – either int64 or float64. To make even better models, there is a lot that data scientists can do. For example, it’s a good option to normalize the columns (get them within the same range) and to do more feature engineering, such as natural language processing, on the non-numerical columns in the source dataset.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
The model.fit method takes the training dataset, X, and the target values, y, and will perform its calculations to make the best fit. This will produce a model, as can be seen in the following output:
# evaluate the model
model.score(X, y)
You’ll get the following output:
0.8002244668911336
Note
The preceding output will vary slightly. For full reproducibility, you can set a random seed.
A score of 0.8 (and something more) means that the model performs accurately in 80% of the cases in the test set.
We can also get some more understanding of how the model works. Enter the following command:
train.corr()
The resulting correlation graph indicates columns that are closely related:
A correlation value of 0 indicates no relationship. The further away from 0, a value is, toward a minimum of -1 and a maximum of 1, the stronger the relationship between the columns is. In the table, you can see that the data in the columns Fare and Pclass (the class of the customer) is related since the correlation between them is -0.549500. On the contrary, the columns Age and PassengerId are not related to each other, as expected; since the passenger ID is just an arbitrary number, it would be strange if that was dependent on the age of a person. The column that is most indicative of the survival of a person is Sex; the value of 0.543351 indicates that once you know whether a person is male or female, you can predict whether they survived the disaster with reasonable accuracy. Therefore, the Sex column is a good feature for the model.
file = open(‘model.pkl’, ‘wb’)
pickle.dump(model, file)
file.close()
The pickle.dump method serializes the model to a model.pkl file in the exercise directory. The file has been opened as wb, which means that it will write bytes to disk. You can check the file exists in the Exercise12.01 folder.
The second part of this exercise is to load the model from disk in an API and expose the API.
Note
The production.ipynb file can be found here: https://packt.live/2ZtjzH7
!pip install flask
You’ll get the following output:
If Flask is already installed, you will get the following output:
from flask import Flask, jsonify, request
import pickle
file = open(‘model.pkl’, ‘rb’) # read bytes
model = pickle.load(file)
file.close()
We now have the same model running in our production environment (the production Jupyter notebook) as in our model training environment (the development notebook from part 1 of this exercise). To test the model, we can make a few predictions.
print(model.predict([[3,0,22.0,1,0,7.2500]]))
print(model.predict([[3,1,22.0,1,0,7.2500]]))
You should get the following output:
[0]
[1]
The values in the output (0 and 1) are the predictions of the Survived output field. This means that the model predicted that our first test subject, a man in class 3 of age 22 with 1 child and 0 parents on board who paid 7.25 for the ticket, was more likely to have survived than a woman with the same characteristics.
We have now created a machine learning model that predicts whether a passenger on the Titanic survived the disaster of 1912. But, this model can only be executed by calling it within a Python script.
app = Flask(‘Titanic’)
@app.route(‘/hi’, methods=[‘GET’])
def bar():
result = ‘hello!’
return result
After running this cell, test the app by entering and running the following line in a new cell:
app.run()
This will result in an ever-running cell (like an infinite loop), indicated by an
asterisk (*), as shown in the following output:
The bottom of the cell already indicates what to do: let’s open a new browser window and go to http://127.0.0.1:5000/hi. Since the action that your browser takes is a GET request to the API, you’ll get the correct response back, as you can see in the following figure:
The API now only returns hello! to demonstrate that it’s working technically. We can now continue to add business logic that produces useful results.
@app.route(‘/survived’, methods=[‘POST’])
def survived():
payload = request.get_json()
person = [payload[‘Pclass’],
payload[‘Sex’], payload[‘Age’],
payload[‘SibSb’], payload[‘Parch’],
payload[‘Fare’]]
result = model.predict([person])
print(f’{person} -> {str(result)}’)
return f’I predict that person {person} has
{“_not_ “ if result == [0] else “”}
survived the Titanic ’
These lines define an HTTP POST method under the URL ‘/survived’. When called, a person object is generated from the JSON payload. The person object, which is an array of input parameters for the model, is then passed to the model in the predict statement that we’ve seen before. The result is finally wrapped up in a string and returned to the caller.
Run the app again (stop it by clicking on the interrupt icon () or by typing Ctrl + C first, if needed). This time, we will test the API with a curl statement. cURL is a program that allows you to make an HTTP request across a network in a similar way to how a web browser makes requests, only the result will be just text instead of a graphical interface.
curl -X POST -H “Content-Type: application/json” -d ‘{“Pclass”: 3, “Sex”: 0, “Age”: 72, “SibSb”: 2, “Parch”: 0, “Fare”: 8.35}’ http://127.0.0.1:5000/survived
Note
There is a Jupyter notebook called validation in GitHub that contains the same command in a cell. It can be found here:
The string after the -d parameter in the curl script contains a JSON object with passenger data. The fields for a person have to be named explicitly.
After running this, you’ll see the result of your API call that executed the model as follows in your Terminal or Anaconda prompt:
I predict that person [3, 0, 72, 2, 0, 8.35] has
not survived the Titanic
This output shows that our API works and that the model is evaluated! The API takes the input of one person (the JSON object in the curl statement) and provides it as input to the machine learning model that it has loaded from disk, by deserializing the pickle model.pkl file. The output is wrapped in a string and transferred back to the Terminal across the network.
Note
To access the source code for this specific section, please refer to https://packt.live/306HgE5.
By completing this exercise, you have practiced a lot of useful techniques. You have built and trained a machine learning model, stored your model as a pickle file, and used the file in a deserialized form in an API using the Flask framework. This has resulted in an application that can predict whether a passenger of the Titanic survived the disaster of 1912 or not. The API can be used from any other application that can access it across a network. In the following activity, you’ll perform the same steps to build an API that predicts in which class a passenger was sitting.
In this activity, you’ll use the same dataset as in the previous exercise. Rather than building a model that predicts whether a person survived the Titanic disaster, we’re going to try to predict which class (1, 2, or 3) a person was in based on their details.
Note
The code for this activity can be found here:
Perform the following steps to complete the activity:
I predict that person [1, 0, 72, 2, 0,28.35] was
in class [2] of the Titanic
You have now completed an activity to train a model and build an API for it. In the next part of this chapter, we are going to explore options to productionize these kinds of services in a live application.
Note
The solution to this activity can be found on page 654.
After creating an API that contains your machine learning model, it has to be hosted in a production environment. There are several ways to do this, such as the following, for example:
We’ll focus on the practice that is very common nowadays and still gaining popularity: containerizing the API and model.
AI applications usually work with large datasets. With “big data” comes the requirement for scalability. This means that models in production should scale in line with the data. One way to scale your software services is to distribute them in containers. A container is a small unit of computational power, similar to a virtual machine. There are many other advantages when containerizing your software: deployment becomes easier and more predictable since the containers stay the same in every environment.
The best framework for containerizing applications is Docker. This open-source tool has become the de facto way to containerize applications. Docker works with the concepts of images and containers. An image is a template of an application that is deployable to an environment. A container is a concrete implementation of an image. What’s inside an image (and in its resulting containers) is very flexible; it can range from a simple “hello world” script to a full-blown enterprise application. It’s considered good practice to build Docker images for a single purpose and have them communicate with each other via standard protocols such as REST. In this way, it’s possible to create infrastructure and software by carefully selecting a set of Docker images and connecting them. For example, one Docker image might contain a database and the data of an application itself. Another Docker image can be created to hold the business logic and expose an API. A third Docker image could then contain a website that is published on the internet. In the next part, about Kubernetes, you will see an example diagram of such an application architecture. In our other exercise and activity in this chapter, we’ll use Docker to productionize a machine learning model.
Docker in itself is not so useful in an enterprise; an environment is needed where Docker images can be published and maintained. One of the most popular frameworks to do so is Kubernetes, which we’ll discuss next.
To create a cluster of many containers, and thus to generate scale in an application, a tool needs to be used that can manage those containers in one environment. Ideally, you would want to deploy many containers with the same software and treat them as if they were one unit. This is exactly the purpose of frameworks such as Kubernetes, Docker Swarm, and OpenShift. They allow software developers to distribute their applications across many “workers,” which are, for example, cloud-based nodes. Kubernetes was created by Google to be used in their cloud computing cluster. It was made open source in 2015. Nowadays, it’s one of the most popular frameworks for scaling software applications. The large Kubernetes community is very active.
Kubernetes works with the concepts of Pods and Nodes. A Pod is the main abstraction for an application; you always deploy an application in one or more Pods. A Kubernetes cluster consists of Nodes, which are the worker servers that run applications. There is always one Master Node that is responsible for managing the cluster. The following figure gives an overview of this model:
In the previous figure, you see a Kubernetes cluster with four nodes. One of the nodes has the Master role and contains one Deployment. From the Deployment, an application can be deployed onto the other Nodes. Once an application is deployed to nodes in a Kubernetes cluster, it runs in a Pod. A Pod contains a group of resources such as storage and networking that are needed to run a container. In the previous figure, there are three nodes. On one of them, there is a Pod that contains an application (the containerized app). If we zoom in on a Node, we can see a structure such as in the following figure:
In the preceding figure, you can see one Node with four Pods in it. Each Pod has its function and contents; for example, the Pod with ID 10.10.10.4 contains three containerized apps and two storage volumes, which work together toward a business goal such as providing a website. You can see that each Node has two core processes running: kubelet for communication, and Docker for running the images. The Kubernetes command-line interface is called kubectl.
Note
If you want to read more about Kubernetes and Docker, please refer to the following link for the official documentation: https://kubernetes.io/docs/home/
In the next exercise, you will learn how to dockerize an API that contains a machine learning model, and how to deploy it to a Kubernetes cluster.
In this exercise, we’ll store a machine learning API as a Docker image, and deploy an instance (a container) to a Kubernetes cluster. This exercise is a follow-up of the previous exercise, so make sure you have completed that one.
In Exercise 12.01, Creating a Machine Learning Model API with pickle and Flask That Predicts Survivors of the Titanic, you created a machine learning model that predicts whether a person survived the Titanic disaster in 1912. You created an API that contains the model. We will be using that model in this exercise. The final goal of this exercise is to deploy the same model to a Kubernetes cluster.
We will not work with Jupyter in this exercise; we’ll do most of the work from Command Prompt and a text editor (or IDE).
Before you begin, follow the installation instructions in the Preface to install Docker and Kubernetes.
This exercise consists of two parts. In the first part, we’ll create a Docker image and publish it to a local registry. In the second part, we’ll run the image in a Kubernetes cluster.
Perform the following steps to complete the exercise:
from flask import Flask, request
import pickle
# load the model from pickle file
file = open(‘model.pkl’, ‘rb’) # read bytes
model = pickle.load(file)
file.close()
# create an API with Flask
app = Flask(‘Titanic’)
# call this: curl -X POST -H “Content-Type: application/json”
# -d ‘{“Pclass”: 3, “Sex”: 0, “Age”: 72, “SibSb”: 2, “Parch”: 0,
#”Fare”: 8.35}’ http://127.0.0.1:5000/survived
@app.route(‘/survived’, methods=[‘POST’])
def survived():
payload = request.get_json()
person = [payload[‘Pclass’], payload[‘Sex’],
payload[‘Age’], payload[‘SibSb’],
payload[‘Parch’], payload[‘Fare’]]
result = model.predict([person])
return f’I predict that person {person} has
{“_not_ “ if result == [0] else “”}survived the Titanic ’
app.run()
This code is the same as in the production notebook of Exercise 12.01, Creating a Machine Learning Model API with pickle and Flask That Predicts Survivors of the Titanic, but is now pulled together in one executable Python file since it’s the most efficient way to run a Python program. It uses the machine learning model in the pickle model.pkl file to predict whether a person survived the Titanic disaster.
python api.py
If all is OK, you should get the message that your API is running on localhost, as seen in the following screenshot:
curl -X POST -H “Content-Type: application/json” -d ‘{“Pclass”: 3, “Sex”: 0, “Age”: 72, “SibSb”: 2, “Parch”: 0, “Fare”: 8.35}’ http://127.0.0.1:5000/survived
You’ll get the following output in the same Terminal, indicating that the API is working and that we get good predictions from the model:
I predict that person [3, 0, 72, 2, 0, 8.35] has
not survived the Titanic
Flask
sklearn
pandas
This simple line is enough for our API. It indicates that we are depending on the Flask, sklearn, and pandas libraries for everything to work. Since pickle is a standard Python library, we don’t have to reference it explicitly. The requirements file is used in the Docker image once it runs in production.
Note
It’s possible to specify the exact version, for example, by entering Flask==1.1.1. To see which version you have in your development environment, enter pip freeze. To export the current list of dependencies and store them as requirements.txt, enter pip freeze > requirements.txt.
Now, let’s continue with containerizing the API to make it ready to deploy to a production environment. We need to create a Docker image, which is a template for creating the actual Docker containers that will be deployed.
sudo docker run hello-world
This command will get a basic Docker image called hello-world from the central repository (Docker Hub). Based on that image, a local container will be created on your local machine. The output will be as follows:
sudo docker image ls
This will give an output like the following:
At the moment, we only see the hello-world image that was created in the previous step.
FROM python:3.7
RUN mkdir /api
WORKDIR /api
ADD . /api/
RUN pip install -r ./requirements.txt
EXPOSE 5000
ENV PYTHONPATH=”$PYTHONPATH:/api”
CMD [“python”, “/api/api.py”]
Dockerfile is the most important artifact when creating your images. There are five main parts to our file.
First (line 1), a base image is acquired. In our case, this is a Python 3.7 image that can run Python applications.
Next (lines 3 to 5), the contents of the current directory are added to the image; this ensures that the api.py and requirements.txt files are packaged within the container. We add all files to a directory called api.
Next (line 7), the required Python libraries that we marked as dependencies in the requirements.txt file are installed with the pip install command. In our case, this is just the Flask library.
Next (line 8), we tell Docker to expose network port 5000 to the outside world. If this is omitted, the API cannot be reached from the network.
Finally (lines 11 and 12), we start the API by setting the Python path to the api directory and executing the python api/api.py command. This is the same command that we tested locally in Step 3 of this exercise.
sudo docker build -t titanic .
This command will pull the base image (with Python 3.7) from Docker Hub and then follow the other instructions in the default Dockerfile (called Dockerfile). The . at the end of the command points to the current directory. The -t parameter specifies the name of our image, titanic. The command might take some time to complete. Once the script completes, this is the expected output:
In the output, it becomes clear that all steps in our Dockerfile have been followed. The Flask library is loaded, port 5000 is exposed, and the API is running.
sudo docker image ls
Next to the hello-world image, you’ll see the base Python 3.7 image and our Titanic API:
It’s great that we have a Docker image now, but that image has to be published to a Docker registry when we want to deploy it. Docker Hub is the central repository, but we don’t want our Titanic API to end up there.
docker run -d -p 6000:5000 --restart=always --name registry registry:2
This will download the registry libraries and will run a local registry. The output is as follows:
docker tag titanic localhost:6000/titanic
docker push localhost:6000/titanic
This generates the following output:
curl -X GET http://localhost:6000/v2/titanic/tags/list
If all is well, you’ll see the image with the latest tag:
{“name”:”titanic”,”tags”:[“latest”]}
As you can see in the output, the titanic image is now also available as localhost:6000/titanic. The registry image is new, and it has a running container called registry:2. We have just successfully published our titanic image to that registry.
In the second part of this exercise, we’ll use the Docker image to host a container on a Kubernetes cluster.
minikube version
If all is OK, this will produce an output like minikube version: v1.8.1, along with a commit hashtag, as follows:
minikube version: v1.8.1
commit: cbda04cf6bbe65e987ae52bb393c10099ab62014
minikube start --insecure-registry=”localhost:6000”
If you’re running within a virtual machine like VirtualBox, the command will be minikube start --driver=none. If all goes well, you’ll get the following output:
minikube status
You’ll get a status update like the following:
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
If you want to look even further into your Kubernetes cluster, start up a dashboard with the following command:
minikube dashboard
This will open a browser window with a lot of useful information and configuration options:
minikube kubectl
This will produce an output like the following:
You can read the kubectl controls the Kubernetes cluster manager line, which indicates that we can now use the kubectl tool to give commands to our cluster. You can get more information about the cluster with the kubectl version and kubectl get nodes commands:
eval $(minikube docker-env)
This command points our Terminal to use a different docker command, namely, the one in the Minikube environment.
docker build -t titanic .
You’ll get the same output as in part 1 of this exercise.
kubectl run titanic --image=titanic --image-pull-policy=Never
If this is successful, you’ll see the following output:
deployment.apps/titanic created
kubectl get deployments
This will produce a list of deployed containers:
You can also check the Kubernetes dashboard if you have started it, and check whether the deployment and Pod have been created:
kubectl port-forward titanic-6d8f58fc8b-znmx9 5000:5000
Note
In the preceding code, replace the name of the pod with your own; you can find it in the Kubernetes dashboard or by entering kubectl get pods
This will create a local task that forwards network traffic to the titanic Pod. You will have the following output:
Forwarding from 127.0.0.1:5000 -> 5000
Forwarding from [::1]:5000 -> 5000
curl -X POST -H “Content-Type: application/json” -d ‘{“Pclass”: 2, “Sex”: 1, “Age”: 34, “SibSb”: 1, “Parch”: 1, “Fare”: 5.99}’ http://127.0.0.1:5000/survived
This sends a JSON string through the proxy to the running container in our Minikube Kubernetes cluster. If all goes well, you’ll get the output of a prediction in the familiar form, in the same Terminal where you executed the curl command:
I predict that person [2, 1, 34, 1, 1, 5.99] has
survived the Titanic
Note
To access the source code for this specific section, please refer to https://packt.live/32s2PC3.
By completing this exercise, you have containerized a machine learning API and have published it to a Docker registry. You have also gained experience with deploying Docker images to a Kubernetes cluster. Together, this gives you the skills to publish a machine learning model in production.
In the next activity, you’ll deploy a machine learning model to a Kubernetes cluster that predicts the class of a Titanic passenger.
In this activity, you will deploy the machine learning model that you created in Activity 12.01, Predicting the Class of a Passenger on the Titanic, to predict the passenger class of a person on board the Titanic to a Kubernetes cluster.
Note
The code for this activity can be found here:
Perform the following steps to complete the activity:
python api.py
If all is OK, you should get the message that your API is running on localhost.
curl -X POST -H “Content-Type: application/json” -d ‘{“Survived”: 0, “Sex”: 1, “Age”: 52, “SibSb”: 1, “Parch”: 0, “Fare”: 82.35}’ http://127.0.0.1:5000/class
curl -X GET http://localhost:6000/v2/titanic/tags/list
curl -X POST -H “Content-Type: application/json”
# -d ‘{“Survived”: 1, “Sex”: 0, “Age”: 72, “SibSb”: 2, “Parch”: 0, “Fare”: 68.35}’ http://127.0.0.1:5000/class
This will produce the following output:
I predict that person [1, 0, 72, 2, 0, 68.35] was
in passenger class [1] of the Titanic
You have now successfully dockerized an API with a machine learning model, and deployed the Docker image to a Kubernetes cluster. In the real world, this is a common approach to productionize software in a cloud environment.
Note
The solution to this activity can be found on page 661.
In the next section, we’ll explore how to deploy machine learning models to an ever-running streaming application.
In the first part of this chapter, you learned how to export models to the pickle format, to be used in an API. That is a good way to productionize models since the resulting microservices architecture is flexible and robust. However, calling an API across a network might not be the best-performing way to get a forecast. As we learned in Chapter 2, Artificial Intelligence Storage Requirements, latency is always an issue when working with high loads of event data. If you’re processing thousands of events per second and have to execute a machine learning model for each event, your network and pickle file that’s stored on disk might not be able to handle the load. So, in a similar way to how we cache data, we should cache models in memory as close to the data stream as possible. That way, we can reduce or even eliminate the network traffic and disk I/O. This technique is often used in high-velocity stream processing applications, for example, fraud detection in banks and real-time recommendation systems for websites.
There are several methods of caching models in memory. Some platforms offer built-in capabilities. For example, the H2O.ai platform has the option to export models to a POJO/MOJO binary object. Spark has its machine learning library called SparkML, which is quite extensive and easy to use. All these methods have one disadvantage: they require a lock-in with the platform. It’s not possible to distribute a model from H2O.ai to DataBricks, or from Spark to DataIku. To enable this kind of flexibility, an intermediate format has to be picked as the “glue” between data scientists and data engineers that gives both practitioners the freedom to choose the tools they want. PMML is such a format, and we’ll discuss it in the next section.
As this book is focused on open source standards, we have picked a popular intermediate model format that we can load in memory – PMML, short for Predictive Model Markup Language. A PMML file is an XML-based file that contains the input parameters, calculations of the algorithms, and output field of a model. Exporting a model to PMML is essentially a way of serializing a model, similar to how exporting to pickle works. A PMML file can be read quite easily by humans, as can be seen in the following figure:
The PMML format is maintained by the Data Mining Group, which is an independent, non-profit consortium of organizations. There is also a JSON-based format for models called Portable Format for Analytics (PFA). Since that format is still emerging and thus less mature than PMML, we will not discuss it further in this book. The next paragraph contains a short introduction to a popular stream processing framework, Apache Flink.
Apache Flink is one of the most popular streaming engines, and for a good reason. It offers low latency and high throughput for streaming data processing and gives a lot of power to developers. Compared to Spark Structured Streaming, another popular stream processing framework, it offers more features and better performance. A Flink job is a Java application that can be executed on a local machine or within a cluster.
In the next exercise, you’ll practice with PMML and Flink by creating a real-time stream processing application that includes a machine learning model.
In this exercise, you’ll create a simple machine learning model, export it to PMML, and load it in memory to be used in a data stream.
Before you begin, follow the installation instructions in the Preface to install Java, Maven, Netcat, and a suitable IDE (IntelliJ IDEA or Eclipse).
Perform the following steps to complete the exercise:
!cp ../Exercise12.01/model.pkl .
!pip install sklearn2pmml
This will install the required library, which can export our models to the PMML format:
If sklearn2pmml is already installed, you will have the following output:
from sklearn2pmml import sklearn2pmml, make_pmml_pipeline
import pickle
file = open(‘model.pkl’, ‘rb’) # read bytes
model = pickle.load(file)
file.close()
Now we can export the model to the PMML format.
pmml_pipeline = make_pmml_pipeline(model)
sklearn2pmml(pmml_pipeline, ‘titanic.pmml’)
First, we create a pipeline model, which is another representation of the model. Then, we call the sklearn2pmml method that performs the export and writes the resulting PMML file (titanic.pmml) to disk.
!cat titanic.pmml
This will produce the following output:
As you can see, the PMML file is an XML-structured file that is quite easy to read. The input and output fields are listed, and the type of model becomes clear from the <RegressionModel> tag.
Now let’s create a streaming job. We’ll use Apache Flink for this. We’ll write the code with an example from IntelliJ IDEA. You can also choose another IDE. It’s also possible (though not recommended) to use a plain text editor and run the code from a Terminal or command line.
mvn archetype:generate
-DarchetypeGroupId=org.apache.flink
-DarchetypeArtifactId=flink-quickstart-java
-DarchetypeVersion=1.10.0
groupId: com
artifactId: titanic
version: 0.0.1
package: packt
This will generate a project from a template.
mvn clean package
This produces the following output:
In the output, you’ll see BUILD SUCCESS, which indicates that you now have a working Java program.
Let’s start by testing the job – first, the Maven template-generated code that can be deployed to a Flink cluster.
Save the file, then import the Maven changes.
import org.apache.flink.api.common.
serialization.SimpleStringEncoder;
import org.apache.flink.core.fs.Path;
import org.apache.flink.streaming.api.
datastream.DataStream;
import org.apache.flink.api.common.
functions.MapFunction;
import org.apache.flink.streaming.api.datastream.
SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.
StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.
sink.filesystem.StreamingFileSink;
These lines will add the necessary Flink libraries to our class file.
DataStream<String> dataStream = env.socketTextStream(
“localhost”, 1234, “ ”);
StreamingFileSink<String> sink = StreamingFileSink
.forRowFormat(new Path(“out”),
new SimpleStringEncoder<String>(“UTF-8”))
.build();
dataStream.addSink(sink);
This code sets up a data stream that listens to a local socket on port 1234. It takes the lines and writes (sinks) the lines to a file in the out directory.
nc -l -p 1234
You get a prompt to enter lines. Leave it open for now.
mvn clean package
mvn exec:java -Dexec.mainClass=”packt.StreamingJob”
These lines are the Maven instructions to compile the code, package it into a JAR file, and run the file with the entry point in the packt.StreamingJob class that contains our main function.
hello!
this
is a test
to check
if Flink is writing
data to a file
You should get the following output:
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.4.15</version>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator-extension</artifactId>
<version>1.4.15</version>
</dependency>
import org.dmg.pmml.FieldName;
import org.jpmml.evaluator.*;
import org.jpmml.evaluator.
visitors.DefaultVisitorBattery;
import java.io.File;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
These lines import the required PMML and common Java libraries for working with PMML files.
// prepare PMML evaluation
ClassLoader classloader= StreamingJob.class
.getClassLoader();
Evaluator evaluator= new LoadingModelEvaluatorBuilder()
.setLocatable(false)
.setVisitors(new DefaultVisitorBattery())
.load(newFile(classLoader.getResource(
“titanic.pmml”).getFile())).build();
List<? extends InputField> inputFields =
evaluator.getInputFields();
This code deserializes the model from PMML, loading the model in memory ready to be executed. The list of input fields will come in handy in the next step.
We are now ready to parse incoming messages as persons that can be evaluated according to whether they survived the Titanic disaster. The input values will be in a comma-separated string with the values “class, sex, age, number of siblings on board, number of parents on board, and fare paid”. For example, “3,0,28,2,0,9.45” indicates a man of 28 years old who entered the ship with his two children.
SingleOutputStreamOperator<String> mapped =
dataStream.map(new MapFunction<String, String>() {
@Override
public String map(String s) throws Exception {
System.out.println(“EVENT: “ + s);
Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
String[] values = s.split(“,”);
// prepare model evaluation
for (int i = 0; i < values.length; i++) {
FieldName inputName = inputFields.get(i).getName();
FieldValue inputValue = inputFields.get(i)
.prepare(values[i]);
arguments.put(inputName, inputValue);
}
// execute the model
Map<FieldName, ?> results = evaluator.
evaluate(arguments);
// Decoupling results from the
//JPMML-Evaluator runtime environment
Map<String, ?> resultRecord = EvaluatorUtil.decodeAll(results);
System.out.println(resultRecord);
return s;
}
});
This code adds a map method to the stream processing job that splits the input string and uses the resulting string values to build a set of arguments for the machine learning model.
3,0,28,2,0,9.45
1,1,12,0,2,3.25
2,1,72,1,1,4.62
You should get the following output:
The Flink job that you have created and that now runs on your local machine is an example of a stream processing application with a machine learning model inside. The machine learning model is imported from a static PMML file and resides in the memory of the streaming job. This is an efficient way to work with models in stream processing.
Note
To access the source code for this specific section, please refer to https://packt.live/38SssNr.
By completing this exercise, you have created a stream processing job that evaluates a machine learning model in real time. The model is deserialized from the PMML format and sinks output to a local filesystem. The resulting application can predict in real time whether passengers of the Titanic would have survived the disaster of 1912. Although this real-time use case might not be so useful, you have now gained substantial practice with setting upstream processing software that can execute machine learning models.
In the next activity, you’ll build a similar stream processing application that predicts the class of Titanic passengers.
In this activity, we’ll create a streaming job that processes events from a local socket and produces a prediction from a machine learning model.
Note
The code for this activity can be found here:
Perform the following steps:
!cat titanic_class.pmml
This will produce the following output:
groupId: com
artifactId: titanic_class
version: 0.0.1
package: packt
This will generate a project from a template.
nc -l -p 1234
1,1,13,1,56.91
0,0,81,0,0,120.96
You should get the following output:
By completing this activity, you have built a streaming job with Flink that can generate predictions in real time.
Note
The solution to this activity can be found on page 670.
With this chapter, you have completed the entire book. That means you now have a thorough understanding of the infrastructure of AI systems. We have covered a tremendous number of topics in the book and supplied a great number of exercises and activities for you to follow. Let’s have a short recap of all the topics in the book.
In Chapter 1, Data Storage Fundamentals, we started with the basics – the chapter covered data storage fundamentals. You learned about AI and machine learning in general, and we used text classification as an example of a machine learning model. Chapter 2, Artificial Intelligence Storage Requirements, was about requirements and covered a great number of concepts in depth. For every data storage layer in a data lake, you learned about the specific requirements and methods to store and retrieve data at scale. We addressed security, scalability, and various other aspects of building great data-driven systems. In Chapter 3, Data Preparation, the data preparation, and processing techniques that are needed to transform data were evaluated. You learned about ETL and ELT, data cleaning, filtering, aggregating, and feature engineering. You also practiced streaming event data processing using Apache Spark.
Chapter 4, Ethics of AI Data Storage, was a less technical chapter, but perhaps even more important than the solely technical ones. The main topic was the ethics of AI storage. We explored a few famous case studies where the ethics of AI were under discussion. You learned about bias and other prejudice in data and models, which gave you a good basis to start any conversation on these topics.
In Chapter 5, Data Stores: SQL and NoSQL Databases, we did a deep dive into databases. You learned about SQL and NoSQL databases; the differences, use cases, best practices, and query languages. By doing some hands-on exercises with technologies such as MySQL, Cassandra, and MongoDB, you learned how to store data in any type of database in the historical and analytics data layers of your data lake. Chapter 6, Big Data File Formats, followed up on this by exploring the file format for big data. You practiced with CSV, JSON, Parquet, and Avro to get a broad perspective on data formats.
In Chapter 7, Introduction to Analytics Engine (Spark) for Big Data, we moved from storing data to the analysis of data. You learned a lot about Apache Spark, one of the most popular data processing engines available. This knowledge comes in handy when discussing the design of data systems, as we did in Chapter 8, Data System Design Examples. Starting from a historical perspective and following up on the requirements of Chapter 2, Artificial Intelligence Storage Requirements, this chapter explored the components of system design and addressed hardware, architecture, data pipelines, security, scaling, and much more. By completing this chapter, you gained more experience in the architecture and design of AI systems.
Chapter 9, Workflow Management for AI, contained an in-depth overview of workflow management. You practiced with several techniques, from simple Python and Bash scripts to Apache Airflow for sophisticated workflow management systems.
In Chapter 10, Introduction to Data Storage on Cloud Services (AWS), we moved on to cloud-based storage for AI systems. We used AWS to explain the concepts and technology that come with storing data in the cloud. Chapter 11, Building an Artificial Intelligence Algorithm, was a real hands-on chapter where you could practice model building and training, and finally, Chapter 12, Productionizing Your AI Applications, explored some techniques to put a machine learning model into production: building an API, running a Docker image in Kubernetes, and serializing the model to PMML to use in a data stream.
We hope that you learned a lot by reading this book. We aimed to provide a good mixture of reading content, hands-on exercises, and fun. We believe that by completing this book we have provided a solid basis for anyone who wants to build AI systems. We have drawn from our own experience, the open-source community, countless colleagues, and other people who have shared their knowledge. We hope you have enjoyed it. Please let us know about your experience with this book; we’re eager to hear from you and we’ll be happy to receive any feedback. Finally, we would like to thank you – thanks for staying with us for all these chapters, thanks for your attention, and thanks for sharing your feedback. We wish you all the best in your careers and good luck with building awesome AI systems.
3.135.246.245