12

Taking TensorFlow to Production

Throughout this book, we have seen that TensorFlow is capable of implementing many models, but there is more that TensorFlow can do. This chapter will show you a few of those things. In this chapter, we will cover the following topics:

  • Visualizing graphs in TensorBoard
  • Managing hyperparameter tuning with TensorBoard's HParams
  • Implementing unit tests using tf.test
  • Using multiple executors
  • Parallelizing TensorFlow using tf.distribute.strategy
  • Saving and restoring a TensorFlow model
  • Using TensorFlow Serving

We'll start by showing how to use the various aspects of TensorBoard, a capability that comes with TensorFlow. This tool allows us to visualize summary metrics, graphs, and images even while our model is training. Next, we will show you how to write code that is ready for production use with a focus on unit tests, training distribution across multiple processing units, and efficient model saving and loading. Finally, we will address a machine learning serving solution by hosting a model as REST endpoints.

Visualizing Graphs in TensorBoard

Monitoring and troubleshooting machine learning algorithms can be a daunting task, especially if you have to wait a long time for the training to complete before you know the results. To work around this, TensorFlow includes a computational graph visualization tool called TensorBoard. With TensorBoard, we can visualize graphs and important values (loss, accuracy, batch training time, and so on) even during training.

Getting ready

To illustrate the various ways we can use TensorBoard, we will reimplement the MNIST model from The Introductory CNN Model recipe in Chapter 8Convolutional Neural Networks. Then, we'll add the TensorBoard callback and fit the model. We will show how to monitor numerical values, histograms of sets of values, how to create an image in TensorBoard, and how to visualize TensorFlow models.

How to do it...

  1. First, we'll load the libraries necessary for the script:
    import tensorflow as tf
    import numpy as np
    import datetime
    
  2. We'll now reimplement the MNIST model:
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    x_train = x_train.reshape(-1, 28, 28, 1)
    x_test = x_test.reshape(-1, 28, 28, 1)
    # Padding the images by 2 pixels since in the paper input images were 32x32
    x_train = np.pad(x_train, ((0,0),(2,2),(2,2),(0,0)), 'constant')
    x_test = np.pad(x_test, ((0,0),(2,2),(2,2),(0,0)), 'constant')
    # Normalize
    x_train = x_train / 255
    x_test = x_test/ 255
    # Set model parameters
    image_width = x_train[0].shape[0]
    image_height = x_train[0].shape[1]
    num_channels = 1 # grayscale = 1 channel
    # Training and Test data variables
    batch_size = 100
    evaluation_size = 500
    generations = 300
    eval_every = 5
    # Set for reproducible results
    seed = 98
    np.random.seed(seed)
    tf.random.set_seed(seed)
    # Declare the model
    input_data = tf.keras.Input(dtype=tf.float32, shape=(image_width,image_height, num_channels), name="INPUT")
    # First Conv-ReLU-MaxPool Layer
    conv1 = tf.keras.layers.Conv2D(filters=6,
                                   kernel_size=5,
                                   padding='VALID',
                                   activation="relu",
                                   name="C1")(input_data)
    max_pool1 = tf.keras.layers.MaxPool2D(pool_size=2,
                                          strides=2, 
                                          padding='SAME',
                                          name="S1")(conv1)
    # Second Conv-ReLU-MaxPool Layer
    conv2 = tf.keras.layers.Conv2D(filters=16,
                                   kernel_size=5,
                                   padding='VALID',
                                   strides=1,
                                   activation="relu",
                                   name="C3")(max_pool1)
    max_pool2 = tf.keras.layers.MaxPool2D(pool_size=2,
                                          strides=2, 
                                          padding='SAME',
                                          name="S4")(conv2)
    # Flatten Layer
    flatten = tf.keras.layers.Flatten(name="FLATTEN")(max_pool2)
    # First Fully Connected Layer
    fully_connected1 = tf.keras.layers.Dense(units=120,
                                             activation="relu",
                                             name="F5")(flatten)
    # Second Fully Connected Layer
    fully_connected2 = tf.keras.layers.Dense(units=84,
                                             activation="relu",
                                             name="F6")(fully_connected1)
    # Final Fully Connected Layer
    final_model_output = tf.keras.layers.Dense(units=10,
                                               activation="softmax",
                                               name="OUTPUT"
                                               )(fully_connected2)
        
    model = tf.keras.Model(inputs= input_data, outputs=final_model_output)
    
  3. Next, we will compile the model with the sparse categorical cross-entropy loss and the Adam optimizer. Then, we'll display the summary:
    model.compile(
        optimizer="adam", 
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    model.summary()
    
  4. We will create a timestamped subdirectory for each run. The summary writer will write the TensorBoard logs to this folder:
    log_dir="logs/experiment-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") 
    
  5. Next, we will instantiate a TensorBoard callback and pass it to the fit method. All logs during the training phase will be stored in this directory and can be viewed instantly in TensorBoard:
    tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, 
                                                          write_images=True,
                                                          histogram_freq=1 )
    model.fit(x=x_train, 
              y=y_train, 
              epochs=5,
              validation_data=(x_test, y_test), 
              callbacks=[tensorboard_callback])
    
  6. We then start the TensorBoard application by running the following command:
    $ tensorboard --logdir="logs"
    
  7. Then we navigate in our browser to the following link: http://127.0.0.0:6006. We can specify a different port if needed by passing, for example, a --port 6007 command (for running on port 6007). We can also start TensorBoard within the notebook through the %tensorboard --logdir="logs" command line. Remember that TensorBoard will be viewable as your program is running.
  8. We can quickly and easily visualize and compare metrics of several experiments during the model training through TensorBoard's scalars view. By default, TensorBoard writes the metrics and losses every epoch. We can update this frequency by batch using the following argument: update_freq='batch'. We can also visualize model weights as images with the argument write_images=True or display bias and weights with histograms (computing every epoch) using histogram_freq=1.
  9. Here is a screenshot of the scalars view:

    Figure 12.1: Training and test loss decrease over time while the training and test accuracy increase

  10. Here, we show how to visualize weights and bias with a histogram summary. With this dashboard, we can plot many histogram visualizations of all the values of a non-scalar tensor (such as weights and bias) at different points in time. So, we can see how the values have changed over time:

    Figure 12.2: The Histograms view to visualize weights and bias in TensorBoard

  11. Now, we will visualize the TensorFlow model through TensorFlow's Graphs dashboard, which shows the model using different views. This dashboard allows visualizing the op-level graph but also the conceptual graph. The op-level displays the Keras model with extra edges to other computation nodes, whereas the conceptual graph displays only the Keras model. These views allow quickly examining and comparing our intended design and understanding the TensorFlow model structure.
  12. Here, we show how to visualize the op-level graph:

    Figure 12.3: The op-level graph in TensorBoard

  13. By adding the TensorBoard callback, we can visualize the loss, the metrics, model weights as images, and so on. But we can also use the tf.summary module for writing summary data that can be visualized in TensorFlow. First, we have to create a FileWriter and then, we can write histogram, scalar, text, audio, or image summaries. Here, we'll write images using the Image Summary API and visualize them in TensorBoard:
    # Create a FileWriter for the timestamped log directory.
    file_writer = tf.summary.create_file_writer(log_dir)
    with file_writer.as_default():
        # Reshape the images and write image summary.
        images = np.reshape(x_train[0:10], (-1, 32, 32, 1))
        tf.summary.image("10 training data examples", images, max_outputs=10, step=0)
    

Figure 12.4: Visualize images in TensorBoard

Be careful of writing image summaries too often to TensorBoard. For example, if we were to write an image summary every generation for 10,000 generations, that would generate 10,000 images worth of summary data. This tends to eat up disk space very quickly.

How it works...

In this section, we implemented a CNN model on the MNIST dataset. We added a TensorBoard callback and fitted the model. Then, we used TensorFlow's visualization tool, which enables you to monitor numerical values and histograms of sets of values, to visualize the model graph, and so on.

Remember that we can launch TensorBoard through a command line as in the recipe but we can also launch it within a notebook by using the %tensorboard magic line.

See also

For some references on the TensorBoard API, visit the following websites:

There's more...

TensorBoard.dev is a free managed service provided by Google. The aim is to easily host, track, and share machine learning experiments with anyone. After we launch our experiments, we just have to upload our TensorBoard logs to the TensorBoard server. Then, we share the link and anyone who has the link can view our experiments. Note not to upload sensitive data because uploaded TensorBoard datasets are public and visible to everyone.

Managing Hyperparameter tuning with TensorBoard's HParams

Tuning hyperparameters in a machine learning project can be a real pain. The process is iterative and can take a long time to test all the hyperparameter combinations. But fortunately, HParams, a TensorBoard plugin, comes to the rescue. It allows testing to find the best combination of hyperparameters.

Getting ready

To illustrate how the HParams plugin works, we will use a sequential model implementation on the MNIST dataset. We'll configure HParams and compare several hyperparameter combinations in order to find the best hyperparameter optimization.

How to do it...

  1. First, we'll load the libraries necessary for the script:
    import tensorflow as tf
    from tensorboard.plugins.hparams import api as hp
    import numpy as np
    import datetime
    
  2. Next, we'll load and prepare the MNIST dataset:
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    # Normalize
    x_train = x_train / 255
    x_test = x_test/ 255
    ## Set model parameters
    image_width = x_train[0].shape[0]
    image_height = x_train[0].shape[1]
    num_channels = 1 # grayscale = 1 channel
    
  3. Then, for each hyperparameter, we'll define the list or the interval of values to test. In this section, we'll go over three hyperparameters: the number of units per layer, the dropout rate, and the optimizer:
    HP_ARCHITECTURE_NN = hp.HParam('archi_nn', 
    hp.Discrete(['128,64','256,128']))
    HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.0, 0.1))
    HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd'])) 
    
  4. The model will be a sequential model with five layers: a flatten layer, followed by a dense layer, a dropout layer, another dense layer, and the output layer with 10 units. The train function takes as an argument the HParams dictionary that contains a combination of hyperparameters. As we use a Keras model, we add an HParams Keras callback on the fit method to monitor each experiment. For each experiment, the plugin will log the hyperparameter combinations, losses, and metrics. We can add a summary File Writer if we want to monitor other information:
    def train_model(hparams, experiment_run_log_dir):
        
        nb_units = list(map(int, hparams[HP_ARCHITECTURE_NN].split(",")))
        
        model = tf.keras.models.Sequential()
        model.add(tf.keras.layers.Flatten(name="FLATTEN"))
        model.add(tf.keras.layers.Dense(units=nb_units[0], activation="relu", name="D1"))
        model.add(tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="DROP_OUT"))
        model.add(tf.keras.layers.Dense(units=nb_units[1], activation="relu", name="D2"))
        model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
        
        model.compile(
            optimizer=hparams[HP_OPTIMIZER], 
            loss="sparse_categorical_crossentropy",
            metrics=["accuracy"]
        )
        
        tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=experiment_run_log_dir)
        hparams_callback = hp.KerasCallback(experiment_run_log_dir, hparams)
        
        model.fit(x=x_train, 
                  y=y_train, 
                  epochs=5,
                  validation_data=(x_test, y_test),
                  callbacks=[tensorboard_callback, hparams_callback]
                 )
    model = tf.keras.Model(inputs= input_data, outputs=final_model_output)
    
  5. Next, we'll iterate on all the hyperparameters:
    for archi_nn in HP_ARCHITECTURE_NN.domain.values:
        for optimizer in HP_OPTIMIZER.domain.values:
            for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
                hparams = {
                    HP_ARCHITECTURE_NN : archi_nn, 
                    HP_OPTIMIZER: optimizer,
                    HP_DROPOUT : dropout_rate
                }
                
                experiment_run_log_dir="logs/experiment-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
                
                train_model(hparams, experiment_run_log_dir)
    
  6. We then start the TensorBoard application by running this command:
    $ tensorboard --logdir="logs"
    
  7. Then, we can quickly and easily visualize the results (hyperparameters and metrics) in the HParams table view. Filters and sorting can be applied on the left pane if needed:

    Figure 12.5: The HParams table view visualized in TensorBoard

  8. On the parallel coordinates view, each axis represents a hyperparameter or a metric and each run is represented by a line. This visualization allows the quick identification of the best hyperparameter combination:

Figure 12.6: The HParams parallel coordinates view visualized in TensorBoard

Using TensorBoard HParams is a simple and insightful way to identify the best hyperparameters and also to manage your experiments with TensorFlow.

See also

For a reference on the HParams TensorBoard plugin, visit the following website:

Implementing unit tests

Testing code results in faster prototyping, more efficient debugging, faster changing, and makes it easier to share code. TensorFlow 2.0 provides the tf.test module and we will cover it in this recipe.

Getting ready

When programming a TensorFlow model, it helps to have unit tests to check the functionality of the program. This helps us because when we want to make changes to a program unit, tests will make sure those changes do not break the model in unknown ways. In Python, the main test framework is unittest but TensorFlow provides its own test framework. In this recipe, we will create a custom layer class. We will implement a unit test to illustrate how to write it in TensorFlow.

How to do it...

  1. First, we need to load the necessary libraries as follows:
    import tensorflow as tf
    import numpy as np
    
  2. Then, we need to declare our custom gate that applies the function f(x) = a1 * x + b1:
    class MyCustomGate(tf.keras.layers.Layer):
     
        def __init__(self, units, a1, b1):
            super(MyCustomGate, self).__init__()
            self.units = units
            self.a1 = a1
            self.b1 = b1
        # Compute f(x) = a1 * x + b1
        def call(self, inputs):
            return inputs * self.a1 + self.b1 
    
  3. Next, we create our unit test class that inherits from the tf.test.TestCase class. The setup method is a hook method that is called before every test method. The assertAllEqual method checks that the expected and the computed outputs have the same values:
    class MyCustomGateTest(tf.test.TestCase):
        def setUp(self):
            super(MyCustomGateTest, self).setUp()
            # Configure the layer with 1 unit, a1 = 2 et b1=1
            self.my_custom_gate = MyCustomGate(1,2,1)
        def testMyCustomGateOutput(self):
            input_x = np.array([[1,0,0,1],
                               [1,0,0,1]])
            output = self.my_custom_gate(input_x)
            expected_output = np.array([[3,1,1,3], [3,1,1,3]])
            self.assertAllEqual(output, expected_output) 
    
  4. Now we need a main() function in our script, to run all unit tests:
    tf.test.main()
    
  5. From the terminal, run the following command. We should get the following output:
    $ python3 01_implementing_unit_tests.py
    ...
    [       OK ] MyCustomGateTest.testMyCustomGateOutput
    [ RUN      ] MyCustomGateTest.test_session
    [  SKIPPED ] MyCustomGateTest.test_session
    ----------------------------------------------------------------------
    Ran 2 tests in 0.016s
    OK (skipped=1)
    

We implemented one test and it passed. Don't worry about the two test_session tests – they are phantom tests.

Note that many assertions tailored to TensorFlow are available in the tf.test API.

How it works...

In this section, we implemented a TensorFlow unit test using the tf.test API that is very similar to the Python unit test. Remember that unit testing helps assure us that code will function as expected, provides confidence in sharing code, and makes reproducibility more accessible.

See also

For a reference on the tf.test module, visit the following website:

Using multiple executors

You will be aware that there are many features of TensorFlow, including computational graphs that lend themselves naturally to being computed in parallel. Computational graphs can be split over different processors as well as in processing different batches. We will address how to access different processors on the same machine in this recipe.

Getting ready

In this recipe, we will show you how to access multiple devices on the same system and train on them. A device is a CPU or an accelerator unit (GPUs, TPUs) where TensorFlow can run operations. This is a very common occurrence: along with a CPU, a machine may have one or more GPUs that can share the computational load. If TensorFlow can access these devices, it will automatically distribute the computations to multiple devices via a greedy process. However, TensorFlow also allows the program to specify which operations will be on which device via a name scope placement.

In this recipe, we will show you different commands that will allow you to access various devices on your system; we'll also demonstrate how to find out which devices TensorFlow is using. Remember that some functions are still experimental and are subject to change.

How to do it...

  1. In order to find out which devices TensorFlow is using for which operations, we will activate the logs for device placement by setting tf.debugging.set_log_device_placement to True. If a TensorFlow operation is implemented for CPU and GPU devices, the operation will be executed by default on a GPU device if a GPU is available:
    tf.debugging.set_log_device_placement(True)
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
    Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
    
  2. We can also use the tensor device attribute that returns the name of the device on which this tensor will be assigned:
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    print(a.device)
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    print(b.device)
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
    Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
    
  3. By default, TensorFlow automatically decides how to distribute computations across computing devices (CPUs and GPUs) and sometimes we need to select the device to use by creating a device context with the tf.device function. Each operation executed in this context will use the selected device:
    tf.debugging.set_log_device_placement(True)
    with tf.device('/device:CPU:0'):
        a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
        b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
        c = tf.matmul(a, b)
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
    Executing op MatMul in device /job:localhost/replica:0/task:0/device:CPU:0
    
  4. If we move the matmul operation out of the context, this operation will be executed on a GPU device if it's available:
    tf.debugging.set_log_device_placement(True)
    with tf.device('/device:CPU:0'):
        a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
        b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
    Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
    
  5. When using GPUs, TensorFlow automatically takes up a large portion of the GPU memory. While this is usually desired, we can take steps to be more careful with GPU memory allocation. While TensorFlow never releases GPU memory, we can slowly grow its allocation to the maximum limit (only when needed) by setting a GPU memory growth option. Note that physical devices cannot be modified after being initialized:
    gpu_devices = tf.config.list_physical_devices('GPU')
    if gpu_devices:
        try:
            tf.config.experimental.set_memory_growth(gpu_devices[0], True)
        except RuntimeError as e:
            # Memory growth cannot be modified after GPU has been initialized
            print(e)
    
  6. If we want to put a hard limit on the GPU memory used by TensorFlow, we can also create a virtual GPU device and set the maximum memory limit (in MB) to allocate on this virtual GPU. Note that virtual devices cannot be modified after being initialized:
    gpu_devices = tf.config.list_physical_devices('GPU')
    if gpu_devices:
        try:
    tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                                       [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
        except RuntimeError as e:
            # Memory growth cannot be modified after GPU has been initialized
            print(e)
    
  7. We can also simulate virtual GPU devices with a single physical GPU. This is done with the following code:
    gpu_devices = tf.config.list_physical_devices('GPU')
    if gpu_devices:
        try:
            tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                                       [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
                                                        tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024) ])
        except RuntimeError as e:
            # Memory growth cannot be modified after GPU has been initialized
            print(e)
    
  8. Sometimes we may need to write robust code that can determine whether it is running with the GPU available or not. TensorFlow has a built-in function that can test whether the GPU is available. This is helpful when we want to write code that takes advantage of the GPU when it is available and assign specific operations to it. This is done with the following code:
    if tf.test.is_built_with_cuda(): 
        <Run GPU specific code here>
    
  9. If we need to assign specific operations, say, to the GPU, we input the following code. This will perform simple calculations and assign operations to the main CPU and the two auxiliary GPUs:
    if tf.test.is_built_with_cuda():
        with tf.device('/cpu:0'):
            a = tf.constant([1.0, 3.0, 5.0], shape=[1, 3])
            b = tf.constant([2.0, 4.0, 6.0], shape=[3, 1])
            
            with tf.device('/gpu:0'):
                c = tf.matmul(a,b)
                c = tf.reshape(c, [-1])
            
            with tf.device('/gpu:1'):
                d = tf.matmul(b,a)
                flat_d = tf.reshape(d, [-1])
            
            combined = tf.multiply(c, flat_d)
        print(combined)
    Num GPUs Available:  2
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
    Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
    Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:1
    Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:1
    Executing op Mul in device /job:localhost/replica:0/task:0/device:CPU:0
    tf.Tensor([  88.  264.  440.  176.  528.  880.  264.  792. 1320.], shape=(9,), dtype=float32)
    

We can see that the first two operations have been performed on the main CPU, the next two on the first auxiliary GPU, and the last two on the second auxiliary GPU.

How it works...

When we want to set specific devices on our machine for TensorFlow operations, we need to know how TensorFlow refers to such devices. Device names in TensorFlow follow the following conventions:

Device

Device name

Main CPU

/device:CPU:0

Main GPU

/GPU:0

Second GPU

/job:localhost/replica:0/task:0/device:GPU:1

Third GPU

/job:localhost/replica:0/task:0/device:GPU:2

Remember that TensorFlow considers a CPU as a unique processor even if the processor is a multi-core processor. All cores are wrapped in /device:CPU:0, that is to say, TensorFlow does indeed use multiple CPU cores by default.

There's more...

Fortunately, running TensorFlow in the cloud is now easier than ever. Many cloud computation service providers offer GPU instances that have a main CPU and a powerful GPU alongside it. Note that an easy way to have a GPU is to run the code in Google Colab and set the GPU as the hardware accelerator in the notebook settings.

Parallelizing TensorFlow

Training a model can be very time-consuming. Fortunately, TensorFlow offers several distributed strategies to speed up the training, whether for a very large model or a very large dataset. This recipe will show us how to use the TensorFlow distributed API.

Getting ready

The TensorFlow distributed API allows us to distribute the training by replicating the model into different nodes and training on different subsets of data. Each strategy supports a hardware platform (multiple GPUs, multiple machines, or TPUs) and uses either a synchronous or asynchronous training strategy. In synchronous training, each worker trains over different batches of data and aggregates their gradients at each step. While in the asynchronous mode, each worker is independently training over the data and the variables are updated asynchronously. Note that for the moment, TensorFlow only supports data parallelism described above and according to the roadmap, it will soon support model parallelism. This paradigm is used when the model is too large to fit on a single device and needs to be distributed over many devices. In this recipe, we will go over the mirrored strategy provided by this API.

How to do it...

  1. First, we'll load the libraries necessary for this recipe as follows:
    import tensorflow as tf
    import tensorflow_datasets as tfds
    
  2. We will create two virtual GPUs:
    # Create two virtual GPUs
    gpu_devices = tf.config.list_physical_devices('GPU')
    if gpu_devices:
        try:
            tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                                       [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
                                                        tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024) ])
        except RuntimeError as e:
            # Memory growth cannot be modified after GPU has been initialized
            print(e)
    
  3. Next, we will load the MNIST dataset via the tensorflow_datasets API as follows:
    datasets, info = tfds.load('mnist', with_info=True, as_supervised=True)
    mnist_train, mnist_test = datasets['train'], datasets['test']
    
  4. Then, we will prepare the data:
    def normalize_img(image, label):
      """Normalizes images: `uint8` -> `float32`."""
      return tf.cast(image, tf.float32) / 255., label
    mnist_train = mnist_train.map(
        normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    mnist_train = mnist_train.cache()
    mnist_train = mnist_train.shuffle(info.splits['train'].num_examples)
    mnist_train = mnist_train.prefetch(tf.data.experimental.AUTOTUNE)
    mnist_test = mnist_test.map(
        normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    mnist_test = mnist_test.cache()
    mnist_test = mnist_test.prefetch(tf.data.experimental.AUTOTUNE)
    
  5. We are now ready to apply a mirrored strategy. The goal of this strategy is to replicate the model across all GPUs on the same machine. Each model is trained on different batches of data and a synchronous training strategy is applied:
    mirrored_strategy = tf.distribute.MirroredStrategy()
    
  6. Next, we check that we have two devices corresponding to the two virtual GPUs created at the beginning of this recipe as follows:
    print('Number of devices: {}'.format(mirrored_strategy.num_replicas_in_sync))
    
  7. Then, we'll define the value of the batch size. The batch size given to the dataset is the global batch size. The global batch size is the sum of all batch sizes of every replica. So, we had to compute the global batch size using the number of replicas:
    BATCH_SIZE_PER_REPLICA = 128
    BATCH_SIZE = BATCH_SIZE_PER_REPLICA * mirrored_strategy.num_replicas_in_sync
    mnist_train = mnist_train.batch(BATCH_SIZE)
    mnist_test = mnist_test.batch(BATCH_SIZE)
    
  8. Next, we'll define and compile our model using the mirrored strategy scope. Note that all variables created inside the scope are mirrored across all replicas:
    with mirrored_strategy.scope():
        model = tf.keras.Sequential()
        model.add(tf.keras.layers.Flatten(name="FLATTEN"))
        model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
        model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
        model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
        
        model.compile(
            optimizer="sgd", 
            loss="sparse_categorical_crossentropy",
            metrics=["accuracy"]
        )
    
  9. Once the compilation is over, we can fit the previous model as we would normally:
    model.fit(mnist_train, 
              epochs=10,
              validation_data= mnist_test
              )
    

Using a strategy scope is the only thing you have to do to distribute your training.

How it works...

Using the distributed TensorFlow API is quite easy. All you have to do is to assign the scope. Then, operations can be manually or automatically assigned to workers. Note that we can easily switch between strategies.

Here's a brief overview of some distributed strategies:

  • The TPU strategy is like the mirrored strategy but it runs on TPUs.
  • The Multiworker Mirrored strategy is very similar to the mirrored strategy but the model is trained across several machines, potentially with multiple GPUs. We have to specify the cross-device communication.
  • The Central Storage strategy uses a synchronous mode on one machine with multiple GPUs. Variables aren't mirrored but placed on the CPU and operations are replicated into all local GPUs.
  • The Parameter Server strategy is implemented on a cluster of machines. Some machines have a worker role and others have a parameter server role. The workers compute and the parameter servers store the variable of the model.

See also

For some references on the tf.distribute.Strategy module, visit the following websites:

There's more...

In this recipe, we've just gotten over the mirrored strategy and we've executed our program eagerly with the Keras API. Note that the TensorFlow distributed API works better when used in graph mode than in eager mode.

This API moves quickly so feel free to consult the official documentation to know which distributed strategies are supported in which scenarios (the Keras API, a custom training loop, or the Estimator API).

Saving and restoring a TensorFlow model

If we want to use our machine learning model in production or reuse our trained model for a transfer learning task, we have to store our model. In this section, we will outline some methods for storing and restoring the weights or the whole model.

Getting ready

In this recipe, we want to summarize various ways to store a TensorFlow model. We will cover the best way to save and restore an entire model, only the weights, and model checkpoints.

How to do it...

  1. We start by loading the necessary libraries:
    import tensorflow as tf
    
  2. Next, we'll build an MNIST model using the Keras Sequential API:
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    # Normalize
    x_train = x_train / 255
    x_test = x_test/ 255
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(name="FLATTEN"))
    model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
    model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
    model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
        
    model.compile(optimizer="sgd", 
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"]
                 )
    model.fit(x=x_train, 
              y=y_train, 
              epochs=5,
              validation_data=(x_test, y_test)
             )
    
  3. Then, we will use the recommended format to save an entire model on disk named the SavedModel format. This format saves the model graph and variables:
    model.save("SavedModel")
    
  4. A directory named SavedModel is created on disk. It contains a TensorFlow program,the saved_model.pb file; the variables directory, which contains the exact value of all parameters; and the assets directory, which contains files used by the TensorFlow graph:
    SavedModel
    |_ assets
    |_ variables
    |_ saved_model.pb 
    

    Note that the save() operation also takes other parameters. Extra directories can be created based on the model complexity and the signatures and options passed to the save method.

  5. Next, we'll restore our saved model:
    model2 = tf.keras.models.load_model("SavedModel") 
    
  6. If we prefer to save the model in the H5 format, we can either pass a filename that ends in .h5 or add the save_format="h5" argument:
    model.save("SavedModel.h5")
    model.save("model_save", save_format="h5")
    
  7. We can also use a ModelCheckpoint callback in order to save an entire model or just the weights into a checkpoint structure at some intervals. This callback is added to the callback argument in the fit method. In the configuration below, the model weights will be stored at each epoch:
    checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath="./checkpoint",save_weights_only=True, save_freq='epoch')
    model.fit(x=x_train, 
              y=y_train, 
              epochs=5,
              validation_data=(x_test, y_test),
              callbacks=[checkpoint_callback]
             )
    
  8. We can load the entire model or only the weights later in order to continue the training. Here, we will reload the weights:
    model.load_weights("./checkpoint")
    

Now, you're ready to save and restore an entire model, only the weights, or model checkpoints.

How it works...

In this section, we provided several ways to store and restore an entire model or only the weights. That allows you to put a model into production or avoids retraining a full model from scratch. We have also seen how to store a model during the training process and after it.

See also

For some references on this topic, visit the following websites:

Using TensorFlow Serving

In this section, we will show you how to serve machine learning models in production. We will use the TensorFlow Serving components of the TensorFlow Extended (TFX) platform. TFX is an MLOps tool that builds complete, end-to-end machine learning pipelines for scalable and high-performance model tasks. A TFX pipeline is composed of a sequence of components for data validation, data transformation, model analysis, and model serving. In this recipe, we will focus on the last component, which can support model versioning, multiple models, and so on.

Getting ready

We'll start this section by encouraging you to read through the official documentation and the short tutorials on the TFX site, available at https://www.tensorflow.org/tfx.

For this example, we will build an MNIST model, save it, download the TensorFlow Serving Docker image, run it, and send POST requests to the REST server in order to get some image predictions.

How to do it...

  1. Here, we will start in the same way as before, by loading the necessary libraries:
    import tensorflow as tf
    import numpy as np
    import requests
    import matplotlib.pyplot as plt
    import json
    
  2. We'll build an MNIST model using the Keras Sequential API:
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    # Normalize
    x_train = x_train / 255
    x_test = x_test/ 255
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(name="FLATTEN"))
    model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
    model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
    model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))
        
    model.compile(optimizer="sgd", 
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"]
                 )
    model.fit(x=x_train, 
              y=y_train, 
              epochs=5,
              validation_data=(x_test, y_test)
             )
    
  3. Then, we will save our model as the SavedModel format and create a directory for each version of our model. TensorFlow Serving wants a specific tree structure and models saved into SavedModel format. Each model version should be exported to a different subdirectory under a given path. So, we can easily specify the version of a model we want to use when we call the server to do predictions:

    Figure 12.7: A screenshot of the directory structure that TensorFlow Serving expects

    The preceding screenshot shows the desired directory structure. In it, we have our defined data directory, my_mnist_model, followed by our model-version number, 1. In the version number directory, we save our protobuf model and a variables folder that contains the desired variables to save.

    We should be aware that inside our data directory, TensorFlow Serving will look for integer folders. TensorFlow Serving will automatically boot up and grab the model under the largest integer number. This means that to deploy a new model, we need to label it version 2 and stick it under a new folder that is also labeled 2. TensorFlow Serving will then automatically pick up the model.

  4. Then, we'll install TensorFlow Serving by using Docker. We encourage readers to visit the official Docker documentation to get Docker installation instructions if needed.

    The first step is to pull the latest TensorFlow Serving Docker image:

    $ docker pull tensorflow/serving
    
  5. Now, we'll start a Docker container: publish the REST API port 8501 to our host's port 8501, take the previously created model, my_mnist_model, bind it to the model base path, /models/my_mnist_model, and fill in the environment variable MODEL_NAME with my_mnist_model:
    $ docker run -p 8501:8501 
      --mount type=bind,source="$(pwd)/my_mnist_model/",target=/models/my_mnist_model 
      -e MODEL_NAME=my_mnist_model -t tensorflow/serving
    
  6. Then, we will display the images to predict:
    num_rows = 4
    num_cols = 3
    plt.figure(figsize=(2*2*num_cols, 2*num_rows))
    for row in range(num_rows):
        for col in range(num_cols):
            index = num_cols * row + col
            image = x_test[index]
            true_label = y_test[index]
            plt.subplot(num_rows, 2*num_cols, 2*index+1)
            plt.imshow(image.reshape(28,28), cmap="binary")
            plt.axis('off')
            plt.title('
    
     It is a {}'.format(y_test[index]), fontdict={'size': 16})
    plt.tight_layout()
    plt.show()
    
  7. We can now submit binary data to the <host>:8501 and get back the JSON response showing the results. We can do this via any machine and with any programming language. It is very useful to not have to rely on the client to have a local copy of TensorFlow.

    Here, we will send POST predict requests to our server and pass the images. The server will return 10 probabilities for each image corresponding to the probability for each digit between 0 and 9:

    json_request = '{{ "instances" : {} }}'.format(x_test[0:12].tolist())
    resp = requests.post('http://localhost:8501/v1/models/my_mnist_model:predict', data=json_request, headers = {"content-type": "application/json"})
    print('response.status_code: {}'.format(resp.status_code))     
    print('response.content: {}'.format(resp.content))
    predictions = json.loads(resp.text)['predictions']
    
  8. Then, we will display the prediction results for our images:
    num_rows = 4
    num_cols = 3
    plt.figure(figsize=(2*2*num_cols, 2*num_rows))
    for row in range(num_rows):
        for col in range(num_cols):
            index = num_cols * row + col
            image = x_test[index]
            predicted_label = np.argmax(predictions[index])
            true_label = y_test[index]
            plt.subplot(num_rows, 2*num_cols, 2*index+1)
            plt.imshow(image.reshape(28,28), cmap="binary")
            plt.axis('off')
            if predicted_label == true_label:
                color = 'blue'
            else:
                color = 'red'
            plt.title('
    
     The model predicts a {} 
     and it is a {}'.format(predicted_label, true_label), fontdict={'size': 16}, color=color)
    plt.tight_layout()
    plt.show()
    

    Now, let's look at a visual representation of 16 predictions:

How it works...

Machine learning teams focus on creating machine learning models and operations teams focus on deploying models. MLOps applies DevOps principles to machine learning. It brings the best practices of software development (commenting, documentation, versioning, testing, and so on) to data science. MLOps is about removing the barriers between the machine learning teams that produce models and the operations teams that deploy models.

In this recipe, we only focus on serving models using the TFX Serving component but TFX is an MLOps tool that builds complete, end-to-end machine learning pipelines. We can only encourage the reader to explore this platform.

There are also many other solutions available that may be used to serve a model, such as Kubeflow, Django/Flask, or managed cloud services such as AWS SageMaker, GCP AI Platform, or Azure ML.

There's more...

Links to tools and resources for architectures not covered in this chapter are as follows:

Share your experience

Thank you for taking the time to read this book. If you enjoyed this book, help others to find it. Leave a review at https://www.amazon.com/dp/1800208863

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.65.65