Throughout this book, we have seen that TensorFlow is capable of implementing many models, but there is more that TensorFlow can do. This chapter will show you a few of those things. In this chapter, we will cover the following topics:
We'll start by showing how to use the various aspects of TensorBoard, a capability that comes with TensorFlow. This tool allows us to visualize summary metrics, graphs, and images even while our model is training. Next, we will show you how to write code that is ready for production use with a focus on unit tests, training distribution across multiple processing units, and efficient model saving and loading. Finally, we will address a machine learning serving solution by hosting a model as REST endpoints.
Monitoring and troubleshooting machine learning algorithms can be a daunting task, especially if you have to wait a long time for the training to complete before you know the results. To work around this, TensorFlow includes a computational graph visualization tool called TensorBoard. With TensorBoard, we can visualize graphs and important values (loss, accuracy, batch training time, and so on) even during training.
To illustrate the various ways we can use TensorBoard, we will reimplement the MNIST model from The Introductory CNN Model recipe in Chapter 8, Convolutional Neural Networks. Then, we'll add the TensorBoard callback and fit the model. We will show how to monitor numerical values, histograms of sets of values, how to create an image in TensorBoard, and how to visualize TensorFlow models.
import tensorflow as tf
import numpy as np
import datetime
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)
# Padding the images by 2 pixels since in the paper input images were 32x32
x_train = np.pad(x_train, ((0,0),(2,2),(2,2),(0,0)), 'constant')
x_test = np.pad(x_test, ((0,0),(2,2),(2,2),(0,0)), 'constant')
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
# Set model parameters
image_width = x_train[0].shape[0]
image_height = x_train[0].shape[1]
num_channels = 1 # grayscale = 1 channel
# Training and Test data variables
batch_size = 100
evaluation_size = 500
generations = 300
eval_every = 5
# Set for reproducible results
seed = 98
np.random.seed(seed)
tf.random.set_seed(seed)
# Declare the model
input_data = tf.keras.Input(dtype=tf.float32, shape=(image_width,image_height, num_channels), name="INPUT")
# First Conv-ReLU-MaxPool Layer
conv1 = tf.keras.layers.Conv2D(filters=6,
kernel_size=5,
padding='VALID',
activation="relu",
name="C1")(input_data)
max_pool1 = tf.keras.layers.MaxPool2D(pool_size=2,
strides=2,
padding='SAME',
name="S1")(conv1)
# Second Conv-ReLU-MaxPool Layer
conv2 = tf.keras.layers.Conv2D(filters=16,
kernel_size=5,
padding='VALID',
strides=1,
activation="relu",
name="C3")(max_pool1)
max_pool2 = tf.keras.layers.MaxPool2D(pool_size=2,
strides=2,
padding='SAME',
name="S4")(conv2)
# Flatten Layer
flatten = tf.keras.layers.Flatten(name="FLATTEN")(max_pool2)
# First Fully Connected Layer
fully_connected1 = tf.keras.layers.Dense(units=120,
activation="relu",
name="F5")(flatten)
# Second Fully Connected Layer
fully_connected2 = tf.keras.layers.Dense(units=84,
activation="relu",
name="F6")(fully_connected1)
# Final Fully Connected Layer
final_model_output = tf.keras.layers.Dense(units=10,
activation="softmax",
name="OUTPUT"
)(fully_connected2)
model = tf.keras.Model(inputs= input_data, outputs=final_model_output)
model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
model.summary()
TensorBoard
logs to this folder:
log_dir="logs/experiment-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
TensorBoard
callback and pass it to the fit
method. All logs during the training phase will be stored in this directory and can be viewed instantly in TensorBoard
:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir,
write_images=True,
histogram_freq=1 )
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback])
TensorBoard
application by running the following command:
$ tensorboard --logdir="logs"
http://127.0.0.0:6006
. We can specify a different port if needed by passing, for example, a --port 6007
command (for running on port 6007). We can also start TensorBoard within the notebook through the %tensorboard --logdir="logs"
command line. Remember that TensorBoard will be viewable as your program is running.update_freq='batch'
. We can also visualize model weights as images with the argument write_images=True
or display bias and weights with histograms (computing every epoch) using histogram_freq=1.
Figure 12.1: Training and test loss decrease over time while the training and test accuracy increase
Figure 12.2: The Histograms view to visualize weights and bias in TensorBoard
Figure 12.3: The op-level graph in TensorBoard
tf.summary
module for writing summary data that can be visualized in TensorFlow. First, we have to create a FileWriter
and then, we can write histogram, scalar, text, audio, or image summaries. Here, we'll write images using the Image Summary API and visualize them in TensorBoard:
# Create a FileWriter for the timestamped log directory.
file_writer = tf.summary.create_file_writer(log_dir)
with file_writer.as_default():
# Reshape the images and write image summary.
images = np.reshape(x_train[0:10], (-1, 32, 32, 1))
tf.summary.image("10 training data examples", images, max_outputs=10, step=0)
Figure 12.4: Visualize images in TensorBoard
Be careful of writing image summaries too often to TensorBoard. For example, if we were to write an image summary every generation for 10,000 generations, that would generate 10,000 images worth of summary data. This tends to eat up disk space very quickly.
In this section, we implemented a CNN model on the MNIST dataset. We added a TensorBoard callback and fitted the model. Then, we used TensorFlow's visualization tool, which enables you to monitor numerical values and histograms of sets of values, to visualize the model graph, and so on.
Remember that we can launch TensorBoard through a command line as in the recipe but we can also launch it within a notebook by using the %tensorboard
magic line.
For some references on the TensorBoard API, visit the following websites:
TensorBoard.dev is a free managed service provided by Google. The aim is to easily host, track, and share machine learning experiments with anyone. After we launch our experiments, we just have to upload our TensorBoard logs to the TensorBoard server. Then, we share the link and anyone who has the link can view our experiments. Note not to upload sensitive data because uploaded TensorBoard datasets are public and visible to everyone.
Tuning hyperparameters in a machine learning project can be a real pain. The process is iterative and can take a long time to test all the hyperparameter combinations. But fortunately, HParams, a TensorBoard plugin, comes to the rescue. It allows testing to find the best combination of hyperparameters.
To illustrate how the HParams plugin works, we will use a sequential model implementation on the MNIST dataset. We'll configure HParams and compare several hyperparameter combinations in order to find the best hyperparameter optimization.
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
import numpy as np
import datetime
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
## Set model parameters
image_width = x_train[0].shape[0]
image_height = x_train[0].shape[1]
num_channels = 1 # grayscale = 1 channel
HP_ARCHITECTURE_NN = hp.HParam('archi_nn',
hp.Discrete(['128,64','256,128']))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.0, 0.1))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
def train_model(hparams, experiment_run_log_dir):
nb_units = list(map(int, hparams[HP_ARCHITECTURE_NN].split(",")))
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(name="FLATTEN"))
model.add(tf.keras.layers.Dense(units=nb_units[0], activation="relu", name="D1"))
model.add(tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="DROP_OUT"))
model.add(tf.keras.layers.Dense(units=nb_units[1], activation="relu", name="D2"))
model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=experiment_run_log_dir)
hparams_callback = hp.KerasCallback(experiment_run_log_dir, hparams)
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, hparams_callback]
)
model = tf.keras.Model(inputs= input_data, outputs=final_model_output)
for archi_nn in HP_ARCHITECTURE_NN.domain.values:
for optimizer in HP_OPTIMIZER.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
hparams = {
HP_ARCHITECTURE_NN : archi_nn,
HP_OPTIMIZER: optimizer,
HP_DROPOUT : dropout_rate
}
experiment_run_log_dir="logs/experiment-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_model(hparams, experiment_run_log_dir)
$ tensorboard --logdir="logs"
Figure 12.5: The HParams table view visualized in TensorBoard
Figure 12.6: The HParams parallel coordinates view visualized in TensorBoard
Using TensorBoard HParams is a simple and insightful way to identify the best hyperparameters and also to manage your experiments with TensorFlow.
For a reference on the HParams TensorBoard plugin, visit the following website:
Testing code results in faster prototyping, more efficient debugging, faster changing, and makes it easier to share code. TensorFlow 2.0 provides the tf.test
module and we will cover it in this recipe.
When programming a TensorFlow model, it helps to have unit tests to check the functionality of the program. This helps us because when we want to make changes to a program unit, tests will make sure those changes do not break the model in unknown ways. In Python, the main test framework is unittest
but TensorFlow provides its own test framework. In this recipe, we will create a custom layer class. We will implement a unit test to illustrate how to write it in TensorFlow.
import tensorflow as tf
import numpy as np
f(x) = a1 * x + b1
:
class MyCustomGate(tf.keras.layers.Layer):
def __init__(self, units, a1, b1):
super(MyCustomGate, self).__init__()
self.units = units
self.a1 = a1
self.b1 = b1
# Compute f(x) = a1 * x + b1
def call(self, inputs):
return inputs * self.a1 + self.b1
tf.test.TestCase
class. The setup
method is a hook
method that is called before every test
method. The assertAllEqual
method checks that the expected and the computed outputs have the same values:
class MyCustomGateTest(tf.test.TestCase):
def setUp(self):
super(MyCustomGateTest, self).setUp()
# Configure the layer with 1 unit, a1 = 2 et b1=1
self.my_custom_gate = MyCustomGate(1,2,1)
def testMyCustomGateOutput(self):
input_x = np.array([[1,0,0,1],
[1,0,0,1]])
output = self.my_custom_gate(input_x)
expected_output = np.array([[3,1,1,3], [3,1,1,3]])
self.assertAllEqual(output, expected_output)
main()
function in our script, to run all unit tests:
tf.test.main()
$ python3 01_implementing_unit_tests.py
...
[ OK ] MyCustomGateTest.testMyCustomGateOutput
[ RUN ] MyCustomGateTest.test_session
[ SKIPPED ] MyCustomGateTest.test_session
----------------------------------------------------------------------
Ran 2 tests in 0.016s
OK (skipped=1)
We implemented one test and it passed. Don't worry about the two test_session
tests – they are phantom tests.
Note that many assertions tailored to TensorFlow are available in the tf.test
API.
In this section, we implemented a TensorFlow unit test using the tf.test
API that is very similar to the Python unit test. Remember that unit testing helps assure us that code will function as expected, provides confidence in sharing code, and makes reproducibility more accessible.
For a reference on the tf.test
module, visit the following website:
You will be aware that there are many features of TensorFlow, including computational graphs that lend themselves naturally to being computed in parallel. Computational graphs can be split over different processors as well as in processing different batches. We will address how to access different processors on the same machine in this recipe.
In this recipe, we will show you how to access multiple devices on the same system and train on them. A device is a CPU or an accelerator unit (GPUs, TPUs) where TensorFlow can run operations. This is a very common occurrence: along with a CPU, a machine may have one or more GPUs that can share the computational load. If TensorFlow can access these devices, it will automatically distribute the computations to multiple devices via a greedy process. However, TensorFlow also allows the program to specify which operations will be on which device via a name scope placement.
In this recipe, we will show you different commands that will allow you to access various devices on your system; we'll also demonstrate how to find out which devices TensorFlow is using. Remember that some functions are still experimental and are subject to change.
tf.debugging.set_log_device_placement
to True
. If a TensorFlow operation is implemented for CPU and GPU devices, the operation will be executed by default on a GPU device if a GPU is available:
tf.debugging.set_log_device_placement(True)
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
print(a.device)
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
print(b.device)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.device
function. Each operation executed in this context will use the selected device:
tf.debugging.set_log_device_placement(True)
with tf.device('/device:CPU:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:CPU:0
matmul
operation out of the context, this operation will be executed on a GPU device if it's available:
tf.debugging.set_log_device_placement(True)
with tf.device('/device:CPU:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
try:
tf.config.experimental.set_memory_growth(gpu_devices[0], True)
except RuntimeError as e:
# Memory growth cannot be modified after GPU has been initialized
print(e)
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
try:
tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
except RuntimeError as e:
# Memory growth cannot be modified after GPU has been initialized
print(e)
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
try:
tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024) ])
except RuntimeError as e:
# Memory growth cannot be modified after GPU has been initialized
print(e)
if tf.test.is_built_with_cuda():
<Run GPU specific code here>
if tf.test.is_built_with_cuda():
with tf.device('/cpu:0'):
a = tf.constant([1.0, 3.0, 5.0], shape=[1, 3])
b = tf.constant([2.0, 4.0, 6.0], shape=[3, 1])
with tf.device('/gpu:0'):
c = tf.matmul(a,b)
c = tf.reshape(c, [-1])
with tf.device('/gpu:1'):
d = tf.matmul(b,a)
flat_d = tf.reshape(d, [-1])
combined = tf.multiply(c, flat_d)
print(combined)
Num GPUs Available: 2
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op Mul in device /job:localhost/replica:0/task:0/device:CPU:0
tf.Tensor([ 88. 264. 440. 176. 528. 880. 264. 792. 1320.], shape=(9,), dtype=float32)
We can see that the first two operations have been performed on the main CPU, the next two on the first auxiliary GPU, and the last two on the second auxiliary GPU.
When we want to set specific devices on our machine for TensorFlow operations, we need to know how TensorFlow refers to such devices. Device names in TensorFlow follow the following conventions:
Device |
Device name |
Main CPU |
|
Main GPU |
|
Second GPU |
|
Third GPU |
|
Remember that TensorFlow considers a CPU as a unique processor even if the processor is a multi-core processor. All cores are wrapped in /device:CPU:0
, that is to say, TensorFlow does indeed use multiple CPU cores by default.
Fortunately, running TensorFlow in the cloud is now easier than ever. Many cloud computation service providers offer GPU instances that have a main CPU and a powerful GPU alongside it. Note that an easy way to have a GPU is to run the code in Google Colab and set the GPU as the hardware accelerator in the notebook settings.
Training a model can be very time-consuming. Fortunately, TensorFlow offers several distributed strategies to speed up the training, whether for a very large model or a very large dataset. This recipe will show us how to use the TensorFlow distributed API.
The TensorFlow distributed API allows us to distribute the training by replicating the model into different nodes and training on different subsets of data. Each strategy supports a hardware platform (multiple GPUs, multiple machines, or TPUs) and uses either a synchronous or asynchronous training strategy. In synchronous training, each worker trains over different batches of data and aggregates their gradients at each step. While in the asynchronous mode, each worker is independently training over the data and the variables are updated asynchronously. Note that for the moment, TensorFlow only supports data parallelism described above and according to the roadmap, it will soon support model parallelism. This paradigm is used when the model is too large to fit on a single device and needs to be distributed over many devices. In this recipe, we will go over the mirrored strategy provided by this API.
import tensorflow as tf
import tensorflow_datasets as tfds
# Create two virtual GPUs
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
try:
tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024) ])
except RuntimeError as e:
# Memory growth cannot be modified after GPU has been initialized
print(e)
tensorflow_datasets
API as follows:
datasets, info = tfds.load('mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
mnist_train = mnist_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
mnist_train = mnist_train.cache()
mnist_train = mnist_train.shuffle(info.splits['train'].num_examples)
mnist_train = mnist_train.prefetch(tf.data.experimental.AUTOTUNE)
mnist_test = mnist_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
mnist_test = mnist_test.cache()
mnist_test = mnist_test.prefetch(tf.data.experimental.AUTOTUNE)
mirrored_strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(mirrored_strategy.num_replicas_in_sync))
BATCH_SIZE_PER_REPLICA = 128
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * mirrored_strategy.num_replicas_in_sync
mnist_train = mnist_train.batch(BATCH_SIZE)
mnist_test = mnist_test.batch(BATCH_SIZE)
with mirrored_strategy.scope():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="FLATTEN"))
model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
model.compile(
optimizer="sgd",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
model.fit(mnist_train,
epochs=10,
validation_data= mnist_test
)
Using a strategy scope is the only thing you have to do to distribute your training.
Using the distributed TensorFlow API is quite easy. All you have to do is to assign the scope. Then, operations can be manually or automatically assigned to workers. Note that we can easily switch between strategies.
Here's a brief overview of some distributed strategies:
For some references on the tf.distribute.Strategy
module, visit the following websites:
tf.distribute
API: https://www.tensorflow.org/api_docs/python/tf/distributeIn this recipe, we've just gotten over the mirrored strategy and we've executed our program eagerly with the Keras API. Note that the TensorFlow distributed API works better when used in graph mode than in eager mode.
This API moves quickly so feel free to consult the official documentation to know which distributed strategies are supported in which scenarios (the Keras API, a custom training loop, or the Estimator API).
If we want to use our machine learning model in production or reuse our trained model for a transfer learning task, we have to store our model. In this section, we will outline some methods for storing and restoring the weights or the whole model.
In this recipe, we want to summarize various ways to store a TensorFlow model. We will cover the best way to save and restore an entire model, only the weights, and model checkpoints.
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="FLATTEN"))
model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
model.compile(optimizer="sgd",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test)
)
model.save("SavedModel")
SavedModel
is created on disk. It contains a TensorFlow program,the saved_model.pb
file; the variables
directory, which contains the exact value of all parameters; and the assets
directory, which contains files used by the TensorFlow graph:
SavedModel
|_ assets
|_ variables
|_ saved_model.pb
Note that the save()
operation also takes other parameters. Extra directories can be created based on the model complexity and the signatures and options passed to the save
method.
model2 = tf.keras.models.load_model("SavedModel")
.h5
or add the save_format="h5"
argument:
model.save("SavedModel.h5")
model.save("model_save", save_format="h5")
ModelCheckpoint
callback in order to save an entire model or just the weights into a checkpoint structure at some intervals. This callback is added to the callback
argument in the fit
method. In the configuration below, the model weights will be stored at each epoch:
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath="./checkpoint",save_weights_only=True, save_freq='epoch')
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[checkpoint_callback]
)
model.load_weights("./checkpoint")
Now, you're ready to save and restore an entire model, only the weights, or model checkpoints.
In this section, we provided several ways to store and restore an entire model or only the weights. That allows you to put a model into production or avoids retraining a full model from scratch. We have also seen how to store a model during the training process and after it.
For some references on this topic, visit the following websites:
tf.saved_model
API: https://www.tensorflow.org/api_docs/python/tf/saved_model/saveIn this section, we will show you how to serve machine learning models in production. We will use the TensorFlow Serving components of the TensorFlow Extended (TFX) platform. TFX is an MLOps tool that builds complete, end-to-end machine learning pipelines for scalable and high-performance model tasks. A TFX pipeline is composed of a sequence of components for data validation, data transformation, model analysis, and model serving. In this recipe, we will focus on the last component, which can support model versioning, multiple models, and so on.
We'll start this section by encouraging you to read through the official documentation and the short tutorials on the TFX site, available at https://www.tensorflow.org/tfx.
For this example, we will build an MNIST model, save it, download the TensorFlow Serving Docker image, run it, and send POST requests to the REST server in order to get some image predictions.
import tensorflow as tf
import numpy as np
import requests
import matplotlib.pyplot as plt
import json
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="FLATTEN"))
model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
model.compile(optimizer="sgd",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test)
)
Figure 12.7: A screenshot of the directory structure that TensorFlow Serving expects
The preceding screenshot shows the desired directory structure. In it, we have our defined data directory, my_mnist_model
, followed by our model-version number, 1
. In the version number directory, we save our protobuf model and a variables
folder that contains the desired variables to save.
We should be aware that inside our data directory, TensorFlow Serving will look for integer folders. TensorFlow Serving will automatically boot up and grab the model under the largest integer number. This means that to deploy a new model, we need to label it version 2 and stick it under a new folder that is also labeled 2. TensorFlow Serving will then automatically pick up the model.
The first step is to pull the latest TensorFlow Serving Docker image:
$ docker pull tensorflow/serving
my_mnist_model
, bind it to the model base path, /models/my_mnist_model
, and fill in the environment variable MODEL_NAME
with my_mnist_model
:
$ docker run -p 8501:8501
--mount type=bind,source="$(pwd)/my_mnist_model/",target=/models/my_mnist_model
-e MODEL_NAME=my_mnist_model -t tensorflow/serving
num_rows = 4
num_cols = 3
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for row in range(num_rows):
for col in range(num_cols):
index = num_cols * row + col
image = x_test[index]
true_label = y_test[index]
plt.subplot(num_rows, 2*num_cols, 2*index+1)
plt.imshow(image.reshape(28,28), cmap="binary")
plt.axis('off')
plt.title('
It is a {}'.format(y_test[index]), fontdict={'size': 16})
plt.tight_layout()
plt.show()
<host>:8501
and get back the JSON response showing the results. We can do this via any machine and with any programming language. It is very useful to not have to rely on the client to have a local copy of TensorFlow.Here, we will send POST predict requests to our server and pass the images. The server will return 10 probabilities for each image corresponding to the probability for each digit between 0
and 9
:
json_request = '{{ "instances" : {} }}'.format(x_test[0:12].tolist())
resp = requests.post('http://localhost:8501/v1/models/my_mnist_model:predict', data=json_request, headers = {"content-type": "application/json"})
print('response.status_code: {}'.format(resp.status_code))
print('response.content: {}'.format(resp.content))
predictions = json.loads(resp.text)['predictions']
num_rows = 4
num_cols = 3
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for row in range(num_rows):
for col in range(num_cols):
index = num_cols * row + col
image = x_test[index]
predicted_label = np.argmax(predictions[index])
true_label = y_test[index]
plt.subplot(num_rows, 2*num_cols, 2*index+1)
plt.imshow(image.reshape(28,28), cmap="binary")
plt.axis('off')
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.title('
The model predicts a {}
and it is a {}'.format(predicted_label, true_label), fontdict={'size': 16}, color=color)
plt.tight_layout()
plt.show()
Now, let's look at a visual representation of 16 predictions:
Machine learning teams focus on creating machine learning models and operations teams focus on deploying models. MLOps applies DevOps principles to machine learning. It brings the best practices of software development (commenting, documentation, versioning, testing, and so on) to data science. MLOps is about removing the barriers between the machine learning teams that produce models and the operations teams that deploy models.
In this recipe, we only focus on serving models using the TFX Serving component but TFX is an MLOps tool that builds complete, end-to-end machine learning pipelines. We can only encourage the reader to explore this platform.
There are also many other solutions available that may be used to serve a model, such as Kubeflow, Django/Flask, or managed cloud services such as AWS SageMaker, GCP AI Platform, or Azure ML.
Links to tools and resources for architectures not covered in this chapter are as follows:
Share your experience Thank you for taking the time to read this book. If you enjoyed this book, help others to find it. Leave a review at https://www.amazon.com/dp/1800208863 |
3.147.65.65