Chapter 9. AI/ML

Machine Learning and AI hold an increasingly important place in enterprise applications. Google Cloud has a number of AI and ML services available, from pre-trained APIs that can be added to existing applications with a few lines of code, to the full featured Cloud AI Platform that can be used to train and operationalize ML models in many frameworks.

With model training and tuning becoming more automated, in particular with tools like AutoML, organizations are focusing more on advanced concepts, including continuous retraining and deployment with MLOps, as well as deploying explainable AI in the enterprise. In this chapter, we will present a number of recipes using these ML tools, from setting up your customized environment, to training and deploying your first model, to more specific techniques aimed at integrating other services.

All code samples for this chapter are located at You can follow along and copy the code for each individual recipe by going to the folder with that recipe’s number

9.1 Creating an AI Platform Notebook


You need a hosted Jupyter Notebook environment running in Google Cloud that is authenticated to Google services to perform data and ML tasks.


You can create, customize, and connect to a Cloud AI Platform notebook.

  1. From the Google Menu Bar, Select AI Platform -> Notebooks

  2. Choose New Instance and you’ll see a list of available instances as shown in Figure 9-1

    New Notebook Instance Dialog
    Figure 9-1. New Notebook Instance Dialog
  3. You’ll now see instance options, as shown in Figure 9-2. Choose your instance type and a GPU to attach if needed

  4. Change the name and customize the specs of your machine to meet your requirements.

  5. Check Install NVIDIA GPU driver automatically if you are using a GPU and prefer not to install drivers from scratch.

    Customize new notebook instance dialog
    Figure 9-2. Customize new notebook instance dialog
  6. Click ‘Create’

  7. You will now see your instance listed, when the status indicators stop spinning and OPEN JUPYTERLAB appears, click the latter to open your notebook environment

    Initialized notebook in the UI
    Figure 9-3. Initialized notebook in the UI
  8. Alternatively, you can create a notebook instance via the cli with the following:

    export INSTANCE_NAME="example-instance"
    export VM_IMAGE_PROJECT="deeplearning-platform-release"
    export VM_IMAGE_FAMILY="tf2-2-3-cpu"
    export MACHINE_TYPE="n1-standard-4"
    export LOCATION="us-central1-b"
    gcloud beta notebooks instances create $INSTANCE_NAME 
      --machine-type=$MACHINE_TYPE --location=$LOCATION


Cloud AI Platform notebooks are where you will perform much of your Data and ML work. The notebooks are a hosted, customizable environment that handles things like installing data science dependencies, installing and configuring NVIDIA drivers (a big win for anyone who has done this more than a couple times!), handling authentication to Google Cloud APIs - either as a service account or user account, and creating a reverse proxy to securely connect via a browser into the notebook and cloud environment. They can be further configured or locked down for more secure environments, for example they can be protected by VPC-SC (VPC Service Controls).

9.2 Training a Python Model Serverlessly


You have a Python model authored and want to train it leveraging serverless compute in the cloud.


Prepare your Python model for submission to the Cloud AI Platform training service. In this case, we will prepare a Tensorflow model.

  1. Create a model, and if in a Jupyter Notebook, export it to a Python file. It looks something like this:

    import tensorflow as tf
    from tensorflow import keras
    import pandas as pd
    # Data Loading and engineering steps omitted
    from tensorflow.keras import Sequential
    from tensorflow.keras.layers import Dense, Dropout
    model = Sequential([
                15, activation="relu", input_shape=(train_features.shape[-1],)
            keras.layers.Dense(10, activation="relu"),
            keras.layers.Dense(1, activation=None)
    model.compile(loss='mae'), train_labels, epochs=500, validation_data=(test_features, test_labels))'gs://dhodun1/temp/model1/')
  2. Prepare your code as a python model. Create a trainer folder, move your model to this folder, rename it, and create an empty __init.py__ file to make it a module.

  3. Execute the following code through the command line to submit your model for training:

    now=$(date +"%Y%m%d_%H%M%S")
    gcloud ai-platform jobs submit training $JOB_NAME 
      --package-path trainer/ 
      --module-name trainer.task 
      --region us-central1 
      --python-version 3.7 
      --runtime-version $TF_VERSION 
      --job-dir $JOB_DIR
  4. You can then watch the logs stream from the cloud console


The AI Platform training service can run any generic Python batch job, but is designed for short or long lived ML training jobs. It also supports passing in custom command line parameters, installing your own Python dependencies, and configuring machine types to support a wide variety of ML training requirements.

9.3 Serving a Python Model with Serverless


You have a Python model authored and want to make predictions in a serverless manner in the cloud.


In this recipe, you will learn how to use Cloud AI Platform to upload and serve a model, including auto-scaling and batch predictions.

  1. You already have a model and have exported it to a Python file stored in GCS.

  2. First, create a “Model” resource which is assigned to a regional endpoint and will house various versions of the model:

    gcloud ai-platform models create 
        $MODEL_NAME --region=$REGION
  3. Create a model version, called “v1”, which will require knowing the SavedModel output location on GCS:

    gcloud ai-platform versions create $VERSION_NAME 
  4. Validate the model is serving:

    %%writefile records.json
    {"dense_input": [0.00541,18.0,2.31,0.538, 6.575, 65.2, 5.0900, 296.0, 15.3]}
    {"dense_input": [0.00332, 0.0, 2.31, 0.437, 7.7, 40.0, 4.0900, 250.0, 17.3]}
    gcloud ai-platform predict --model=boston_housing --version=$VERSION_NAME --json-instances records.json


The AI Platform Model serving service allows you to easily host your models without having to worry about managing, maintaining, or scaling the underlying infrastructure. You can still configure machine type, add GPUs, etc. Regional endpoints vs. the Global endpoint allow you to specify exactly where your model runs.

9.4 Get Explanations with your ML Predictions


You need to get attributions for each feature in your model at prediction time for use in a business application.


In this recipe, you will learn how to prepare your model for deployment on Cloud AI Platform to support explanations at prediction time, including adding baseline values for each of your features, deploying the model, then receiving explanations.

  1. You already have a model trained with and have exported it to a Python file stored in GCS. The model is loaded in a Jupyter Notebook:

                     !pip install explainable-ai-sdk
    import explainable_ai_sdk
    # Print the names of your tensors
    print('Model input tensor: ',
    print('Model output tensor: ',
    # Create and save explainability metadata to SavedModel location
    from import SavedModelMetadataBuilder
    builder = SavedModelMetadataBuilder(MODEL_DIR)
        # Name of input layer in keras model, 'dense_input' in this case':')[0],
    # Create a top-level model resource to then house versions of the model
    ! gcloud ai-platform models create 
        $MODEL_NAME --region=$REGION
    ! gcloud beta ai-platform versions create $VERSION_NAME   --model=$MODEL_NAME 
      --explanation-method $EXPLAIN_METHOD 
      --num-integral-steps 25 
      --region $REGION
    # Create a json request object
    prediction_json = {':')[0]: test_features.iloc[0].values.tolist()}
    remote_ig_model = explainable_ai_sdk.load_model_from_ai_platform(PROJECT_ID, MODEL_NAME, VERSION_NAME)
    ig_response = remote_ig_model.explain([prediction_json])
    attr = ig_response[0].get_attribution()
    predicted = round(attr.example_score, 2)
    print('Predicted price: $' + str(predicted))
    print('Actual price: $' + str(test_labels.iloc[0]))
    Predicted price: $1943.19
    Actual price: $8500
    Attribution output
    Figure 9-4. Attribution output


Explainability in Machine Learning is a deep and complex topic, and is an importanti part of responsible and widespread use of ML. Cloud AI Platform provides an easy to use service that not only serves predictions for your models, but also calculates attribution values for each of the features. Integrated Gradients are used for fully differentiable models, i.e. Neural Networks, and TreeSHAP are used for other types of models like Gradient Boosted Trees. For this example, the most important decision is the ‘baseline’ values for the explanations. This is what the attribution values are calculated against - often the median or mode for a particular feature is chosen.

Citation: Google AI Whitepaper

9.5 Create a Custom Notebook Environment


You need specific libraries pre-installed in your AI Platform Notebook environment and need consistency between your development environment and production pipeline environment.


In this recipe, you will learn how to build a custom data science container, including with GPU support, build it leveraging Cloud Build, push it to a Container Registry, and use it to create a custom AI Platform Notebook.

All code samples are located at

  1. On your local workstation go to the cloned repository, then go to the chapter-8/8-1-custom_notebook folder.

  2. Examine the contents of the Dockerfile, requirements.txt and

  3. Run the following commands to start execution of the build

    PROJECT_ID=$(gcloud config get-value project)
    # Create and upload tarball of current directory '.' to Cloud Build service to build the image
    gcloud builds submit --timeout 10m --tag ${IMAGE_URI} .
  4. This will create a tarball of the local directory, upload it to the Cloud Build service, build the container image, and push it to Google Container Registry. You can examine the logs that immediately start streaming from the command line or under the Cloud Build section of the Google Cloud console.

Understanding the code

requirements.txt contains PyPi python dependencies and their versions to be installed with pip. Dockerfile pulls from a base AI Platform Notebook container image and installs the python dependencies listed in requirements.txt, and uses Cloud Build to build and push the container image.

# requirements.txt


            # Dockerfile

COPY requirements.txt .
RUN python3 -m pip install -U -r requirements.txt


PROJECT_ID=$(gcloud config get-value project)
# Create and upload tarball of current directory '.' to Cloud Build service to build the image
gcloud builds submit --timeout 10m --tag ${IMAGE_URI} .
  1. Deploy a custom AI Platform Notebook

  2. In the Google Cloud console, navigate to AI Platform > Notebooks.

  3. Click + New Instance > Customize Instance

  4. Provide an Instance name

  5. Under Environment, choose ‘Custom Container’

  6. Provide the location of your container, for example: ‘’

  7. Click Create

    Create Custom Notebook screenshot
    Figure 9-5. Create Custom Notebook screenshot
  8. Once the notebook is created, the OPEN JUPTYERLAB link should be enabled. Click this.

  9. Open a Terminal in your notebook and run the following to see your python dependencies installed

    pip freeze | grep kfp
  10. Alternatively, you can create the notebook using the gcloud command line tool:

     gcloud beta notebooks instances create my-notebook-2 --container-repository --machine-type n1-standard-4 --location us-west1-b


This section shows how to bake in custom dependencies to an AI Platform Notebook image programmatically. This is often helpful when deploying a standard data science image across a team or teams, as well as aligning dependencies in the development phase with a production environment. The base AI Platform containers already have a large variety of data science and ML tools, hence the large container size (>1GB), but often it will be necessary to augment or change package versions. By using the AI Platform container as the parent image, you are able to retain most of the benefits of the out-of-the-box AI Platform notebooks, such as authenticated reverse proxy, service or user account alignment, and a unified API for all notebooks, while customizing and potentially further securing the environment.

9.6 Tensorflow Batch Predictions on Cloud AI Platform


You need to perform batch Tensorflow model predictions with a distributed service like Cloud AI Platform or Dataflow and need to modify your model to forward a unique instance key for each prediction input.


In this recipe, you will learn how to modify an existing Tensorflow model by modifying the serving signature to forward a unique instance key for batch predictions. Then upload it for serving on Cloud AI Platform to verify and perform these predictions.

A sample notebook is located at

  1. In a Cloud AI Platform notebook, go to the cloned repository, then go to the chapter-8/8-5-tf_batch_predictions folder and open the sample notebook.

  2. Complete the cells through the “SavedModel and serving signature” section. This builds a simple Tensorflow model with the Keras API to predicted hand-written images and saves it to the ‘./model/’ directory in the SavedModel format.

  3. The following bash command will show the serving signature and the inputs/outputs of the saved model:

    !saved_model_cli show --tag_set serve --signature_def serving_default --dir {MODEL_EXPORT_PATH}
    The given SavedModel SignatureDef contains the following input(s):
    inputs['image'] tensor_info:
          dtype: DT_FLOAT
          shape: (-1, 28, 28)
          name: serving_default_image:0
    The given SavedModel SignatureDef contains the following output(s):
      outputs['preds'] tensor_info:
          dtype: DT_FLOAT
          shape: (-1, 10)
          name: StatefulPartitionedCall:0
    Method name is: tensorflow/serving/predict
    Load the SavedModel
    MODEL_EXPORT_PATH = './model/'
    loaded_model = tf.keras.models.load_model(MODEL_EXPORT_PATH)
    You can test the existing inference function off of the loaded model that corresponds with the ‘serving_default’ signature
    inference_function = loaded_model.signatures['serving_default']
    result = inference_function(tf.convert_to_tensor(test_image))
    {'preds': <tf.Tensor: shape=(1, 10), dtype=float32, numpy=
    array([[1.9574834e-05, 1.7343391e-06, 1.5372832e-05, 6.3454769e-05,
            4.5845241e-05, 4.0783577e-02, 1.1227881e-04, 4.5515549e-01,
            1.0713221e-02, 4.9308944e-01]], dtype=float32)>}
  4. Now you’ll create a new serving function that accepts and outputs a unique instance key. You’ll use the fact that a Keras Model(x) call actually runs a prediction.

    @tf.function(input_signature=[tf.TensorSpec([None], dtype=tf.string),tf.TensorSpec([None, 28, 28], dtype=tf.float32)])
    def keyed_prediction(key, image):
        pred = loaded_model(image, training=False)
        return {
            'preds': pred,
            'key': key
  5. Next, save the new model and inspect the new signature:

    KEYED_EXPORT_PATH = './keyed_model/', signatures={'serving_default': keyed_prediction})
    !saved_model_cli show --tag_set serve --signature_def serving_default --dir {KEYED_EXPORT_PATH}
  6. The given SavedModel SignatureDef contains the following input(s):

      inputs['image'] tensor_info:
          dtype: DT_FLOAT
          shape: (-1, 28, 28)
          name: serving_default_image:0
      inputs['key'] tensor_info:
          dtype: DT_STRING
          shape: (-1)
          name: serving_default_key:0
  7. The given SavedModel SignatureDef contains the following output(s):

      outputs['key'] tensor_info:
          dtype: DT_STRING
          shape: (-1)
          name: StatefulPartitionedCall:0
      outputs['preds'] tensor_info:
          dtype: DT_FLOAT
          shape: (-1, 10)
          name: StatefulPartitionedCall:1
    Method name is: tensorflow/serving/predict
  8. You can create an AI Platform model as follows:

    # REGION and BUCKET and MODEL_LOCATION set earlier
    gcloud ai-platform models create ${MODEL_NAME} --regions $REGION
    gcloud ai-platform versions create ${MODEL_VERSION} 
           --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --staging-bucket gs://${BUCKET} 
           --runtime-version $TFVERSION
  9. Lastly, examine and upload a file for batch predictions:

    !cat images.json
    !gcloud ai-platform predict --model fashion_mnist --json-instances keyed_input.json --version v1 --signature-name serving_default
    KEY            PREDS
    image_id_1234  [1.9574799807742238e-05, 1.7343394347335561e-06, 1.537282150820829e-05, 6.345478323055431e-05, 4.584524504025467e-05, 0.0407835878431797, 0.00011227882350794971, 0.4551553726196289, 0.010713225230574608, 0.493089497089386]


Forwarding instance keys or even individual features comes in handy in any instance where a given prediction needs to be tracked, for example when performing predictions in a batch setting or when recording predictions in a truth database for continuous evaluation of the model.

See more: How to extend a Canned Tensorflow Estimator

How to extend a Keras Model

9.7 BQ Data in Tensorflow or Pytorch Model


You need to build a more advanced, hand-tuned ML model in Python using data stored in BigQuery. Additionally, you plan to use an architecture not supported by BQML and AutoML is out of scope.


In this recipe, you will learn how to easily extract data from BigQuery using both for large datasets or the BigQuery client to extract an in-memory dataset into a Pandas DataFrame.

  1. In a Cloud AI Platform notebook, go to the cloned repository, then go to the chapter-8/8-8-bq_data_tf_model folder and open the sample notebook.

  2. The first option to extract data from BigQuery into memory is using the built-in BigQuery magic and pass the name of the dataframe to output the query, in this case df_from_magic:

    %%bigquery df_from_magic --use_bqstorage_api
    WHERE EXTRACT(YEAR from start_date) = 2017
    AND EXTRACT(MONTH from start_date) = 1
     x BigQuery magic DataFrame output
    Figure 9-6. -x BigQuery magic DataFrame output
  3. Alternatively you can load the BigQuery client directly and query that way:

  4. Run the following code in your notebook:

    from import bigquery
    client = bigquery.Client()
    query_string = """
    SELECT duration, start_station_id,
      EXTRACT(DAYOFWEEK from start_date) as day_of_week,
      EXTRACT(HOUR from start_date) as hour
    WHERE EXTRACT(YEAR from start_date) = 2017
    AND EXTRACT(MONTH from start_date) = 1
    df = client.query(query_string).to_dataframe()
     x BigQuery client DataFrame output
    Figure 9-7. -x BigQuery client DataFrame output
  5. Next, build a object from the DataFrame to then feed into your model:

  6. Run the following code:

    import tensorflow as tf
    target = df.pop('duration')
    dataset =, target.values))
    for feat, targ in dataset.take(5):
      print ('Features: {}, Target: {}'.format(feat, targ))
    train_dataset = dataset.shuffle(len(df)).batch(64).prefetch(1)
    # 1=AUTOTUNE
  7. Build and train your model with the tf.dataset:

    from tensorflow import keras
    # Simple model shown for simplicity and using the API
    model = keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu'),
        tf.keras.layers.Dense(10, activation='relu'),
    model.compile(optimizer='adam', loss='mean_absolute_error'), epochs=2)


If you are building a python-based model outside of BigQuery or AutoML, and the dataset can fit in-memory (you can easily add large quantities of memory to temporary GCE VMs), extracting a BQ training set to a Pandas DataFrame is an excellent way to feed your model. In the case of Tensorflow, will accept dataframes. There are other methods for other frameworks. If your dataset is larger than what can be stored in-memory, it is recommended to export the data to multiple files on GCS, either CSV format or the highly optimized TFRecord format.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.