Machine Learning and AI hold an increasingly important place in enterprise applications. Google Cloud has a number of AI and ML services available, from pre-trained APIs that can be added to existing applications with a few lines of code, to the full featured Cloud AI Platform that can be used to train and operationalize ML models in many frameworks.
With model training and tuning becoming more automated, in particular with tools like AutoML, organizations are focusing more on advanced concepts, including continuous retraining and deployment with MLOps, as well as deploying explainable AI in the enterprise. In this chapter, we will present a number of recipes using these ML tools, from setting up your customized environment, to training and deploying your first model, to more specific techniques aimed at integrating other services.
All code samples for this chapter are located at https://github.com/ruiscosta/google-cloud-cookbook/chapter-8. You can follow along and copy the code for each individual recipe by going to the folder with that recipe’s number
You need a hosted Jupyter Notebook environment running in Google Cloud that is authenticated to Google services to perform data and ML tasks.
You can create, customize, and connect to a Cloud AI Platform notebook.
From the Google Menu Bar, Select AI Platform -> Notebooks
Choose New Instance and you’ll see a list of available instances as shown in Figure 9-1
You’ll now see instance options, as shown in Figure 9-2. Choose your instance type and a GPU to attach if needed
Change the name and customize the specs of your machine to meet your requirements.
Check Install NVIDIA GPU driver automatically if you are using a GPU and prefer not to install drivers from scratch.
Click ‘Create’
You will now see your instance listed, when the status indicators stop spinning and OPEN JUPYTERLAB appears, click the latter to open your notebook environment
Alternatively, you can create a notebook instance via the cli with the following:
export INSTANCE_NAME="example-instance" export VM_IMAGE_PROJECT="deeplearning-platform-release" export VM_IMAGE_FAMILY="tf2-2-3-cpu" export MACHINE_TYPE="n1-standard-4" export LOCATION="us-central1-b" gcloud beta notebooks instances create $INSTANCE_NAME --vm-image-project=$VM_IMAGE_PROJECT --vm-image-family=$VM_IMAGE_FAMILY --machine-type=$MACHINE_TYPE --location=$LOCATION
Cloud AI Platform notebooks are where you will perform much of your Data and ML work. The notebooks are a hosted, customizable environment that handles things like installing data science dependencies, installing and configuring NVIDIA drivers (a big win for anyone who has done this more than a couple times!), handling authentication to Google Cloud APIs - either as a service account or user account, and creating a reverse proxy to securely connect via a browser into the notebook and cloud environment. They can be further configured or locked down for more secure environments, for example they can be protected by VPC-SC (VPC Service Controls).
You have a Python model authored and want to train it leveraging serverless compute in the cloud.
Prepare your Python model for submission to the Cloud AI Platform training service. In this case, we will prepare a Tensorflow model.
Create a model, and if in a Jupyter Notebook, export it to a Python file. It looks something like this:
import tensorflow as tf from tensorflow import keras import pandas as pd # Data Loading and engineering steps omitted from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Dropout model = Sequential([ keras.layers.Dense( 15, activation="relu", input_shape=(train_features.shape[-1],) ), keras.layers.Dense(10, activation="relu"), keras.layers.Dense(1, activation=None) ]) model.compile(loss='mae') model.fit(train_features, train_labels, epochs=500, validation_data=(test_features, test_labels)) model.save('gs://dhodun1/temp/model1/')
Prepare your code as a python model. Create a trainer folder, move your model to this folder, rename it task.py, and create an empty __init.py__ file to make it a module.
Execute the following code through the command line to submit your model for training:
BUCKET_NAME='dhodun1' JOB_DIR="gs://$BUCKET_NAME/keras-job-dir" now=$(date +"%Y%m%d_%H%M%S") JOB_NAME="my_training_job_$now" TF_VERSION=2.1 gcloud ai-platform jobs submit training $JOB_NAME --package-path trainer/ --module-name trainer.task --region us-central1 --python-version 3.7 --runtime-version $TF_VERSION --job-dir $JOB_DIR
You can then watch the logs stream from the cloud console
The AI Platform training service can run any generic Python batch job, but is designed for short or long lived ML training jobs. It also supports passing in custom command line parameters, installing your own Python dependencies, and configuring machine types to support a wide variety of ML training requirements.
You have a Python model authored and want to make predictions in a serverless manner in the cloud.
In this recipe, you will learn how to use Cloud AI Platform to upload and serve a model, including auto-scaling and batch predictions.
You already have a model and have exported it to a Python file stored in GCS.
First, create a “Model” resource which is assigned to a regional endpoint and will house various versions of the model:
BUCKET='dhodun1' REGION='us-central1' MODEL_NAME='boston_housing' gcloud ai-platform models create $MODEL_NAME --region=$REGION
Create a model version, called “v1”, which will require knowing the SavedModel output location on GCS:
MODEL_DIR="gs://dhodun1/temp/model1/" VERSION_NAME="v1" FRAMEWORK="TENSORFLOW" MACHINE_TYPE="n1-standard-4" gcloud ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --runtime-version=2.1 --framework=$FRAMEWORK --python-version=3.7 --region=$REGION --machine-type=$MACHINE_TYPE
Validate the model is serving:
%%writefile records.json {"dense_input": [0.00541,18.0,2.31,0.538, 6.575, 65.2, 5.0900, 296.0, 15.3]} {"dense_input": [0.00332, 0.0, 2.31, 0.437, 7.7, 40.0, 4.0900, 250.0, 17.3]} gcloud ai-platform predict --model=boston_housing --version=$VERSION_NAME --json-instances records.json
The AI Platform Model serving service allows you to easily host your models without having to worry about managing, maintaining, or scaling the underlying infrastructure. You can still configure machine type, add GPUs, etc. Regional endpoints vs. the Global endpoint allow you to specify exactly where your model runs.
You need to get attributions for each feature in your model at prediction time for use in a business application.
In this recipe, you will learn how to prepare your model for deployment on Cloud AI Platform to support explanations at prediction time, including adding baseline values for each of your features, deploying the model, then receiving explanations.
You already have a model trained with model.fit() and have exported it to a Python file stored in GCS. The model is loaded in a Jupyter Notebook:
!pip install explainable-ai-sdk import explainable_ai_sdk # Print the names of your tensors print('Model input tensor: ', model.input.name) print('Model output tensor: ', model.output.name) # Create and save explainability metadata to SavedModel location from explainable_ai_sdk.metadata.tf.v2 import SavedModelMetadataBuilder builder = SavedModelMetadataBuilder(MODEL_DIR) builder.set_numeric_metadata( # Name of input layer in keras model, 'dense_input' in this case model.input.name.split(':')[0], input_baselines=train_features.median().values.tolist(), index_feature_mapping=train_features.columns.tolist() ) builder.save_metadata(MODEL_DIR) REGION='us-central1' MODEL_NAME='boston_housing' # Create a top-level model resource to then house versions of the model ! gcloud ai-platform models create $MODEL_NAME --region=$REGION VERSION_NAME="v1explanation" FRAMEWORK="TENSORFLOW" MACHINE_TYPE="n1-standard-4" EXPLAIN_METHOD="integrated-gradients" ! gcloud beta ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --runtime-version=2.1 --framework=$FRAMEWORK --python-version=3.7 --machine-type=$MACHINE_TYPE --explanation-method $EXPLAIN_METHOD --num-integral-steps 25 --region $REGION # Create a json request object prediction_json = {model.input.name.split(':')[0]: test_features.iloc[0].values.tolist()} print(prediction_json) PROJECT_ID='dhodun1' remote_ig_model = explainable_ai_sdk.load_model_from_ai_platform(PROJECT_ID, MODEL_NAME, VERSION_NAME) ig_response = remote_ig_model.explain([prediction_json]) attr = ig_response[0].get_attribution() predicted = round(attr.example_score, 2) print('Predicted price: $' + str(predicted)) print('Actual price: $' + str(test_labels.iloc[0])) Predicted price: $1943.19 Actual price: $8500 ig_response[0].visualize_attributions()
Explainability in Machine Learning is a deep and complex topic, and is an importanti part of responsible and widespread use of ML. Cloud AI Platform provides an easy to use service that not only serves predictions for your models, but also calculates attribution values for each of the features. Integrated Gradients are used for fully differentiable models, i.e. Neural Networks, and TreeSHAP are used for other types of models like Gradient Boosted Trees. For this example, the most important decision is the ‘baseline’ values for the explanations. This is what the attribution values are calculated against - often the median or mode for a particular feature is chosen.
Citation: Google AI Whitepaper
You need specific libraries pre-installed in your AI Platform Notebook environment and need consistency between your development environment and production pipeline environment.
In this recipe, you will learn how to build a custom data science container, including with GPU support, build it leveraging Cloud Build, push it to a Container Registry, and use it to create a custom AI Platform Notebook.
All code samples are located at https://github.com/ruiscosta/google-cloud-cookbook
On your local workstation go to the cloned repository, then go to the chapter-8/8-1-custom_notebook folder.
Examine the contents of the Dockerfile, requirements.txt and build_container.sh.
Run the following commands to start execution of the build
IMAGE_NAME=custom_notebook TAG=latest PROJECT_ID=$(gcloud config get-value project) IMAGE_URI="gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${TAG}" # Create and upload tarball of current directory '.' to Cloud Build service to build the image gcloud builds submit --timeout 10m --tag ${IMAGE_URI} .
This will create a tarball of the local directory, upload it to the Cloud Build service, build the container image, and push it to Google Container Registry. You can examine the logs that immediately start streaming from the command line or under the Cloud Build section of the Google Cloud console.
requirements.txt contains PyPi python dependencies and their versions to be installed with pip. Dockerfile pulls from a base AI Platform Notebook container image and installs the python dependencies listed in requirements.txt, and build_container.sh uses Cloud Build to build and push the container image.
# requirements.txt
kfp==0.2.5 # Dockerfile FROM gcr.io/deeplearning-platform-release/base-cpu COPY requirements.txt . RUN python3 -m pip install -U -r requirements.txt # build_container.sh #!/bin/bash IMAGE_NAME=custom_notebook TAG=latest PROJECT_ID=$(gcloud config get-value project) IMAGE_URI="gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${TAG}" # Create and upload tarball of current directory '.' to Cloud Build service to build the image gcloud builds submit --timeout 10m --tag ${IMAGE_URI} .
Deploy a custom AI Platform Notebook
In the Google Cloud console, navigate to AI Platform > Notebooks.
Click + New Instance > Customize Instance
Provide an Instance name
Under Environment, choose ‘Custom Container’
Provide the location of your container, for example: ‘gcr.io/MY_PROJECT/custom_notebook:latest’
Click Create
Once the notebook is created, the OPEN JUPTYERLAB link should be enabled. Click this.
Open a Terminal in your notebook and run the following to see your python dependencies installed
pip freeze | grep kfp
Alternatively, you can create the notebook using the gcloud command line tool:
gcloud beta notebooks instances create my-notebook-2 --container-repository gcr.io/dhodun1/custom_notebook --machine-type n1-standard-4 --location us-west1-b
This section shows how to bake in custom dependencies to an AI Platform Notebook image programmatically. This is often helpful when deploying a standard data science image across a team or teams, as well as aligning dependencies in the development phase with a production environment. The base AI Platform containers already have a large variety of data science and ML tools, hence the large container size (>1GB), but often it will be necessary to augment or change package versions. By using the AI Platform container as the parent image, you are able to retain most of the benefits of the out-of-the-box AI Platform notebooks, such as authenticated reverse proxy, service or user account alignment, and a unified API for all notebooks, while customizing and potentially further securing the environment.
You need to perform batch Tensorflow model predictions with a distributed service like Cloud AI Platform or Dataflow and need to modify your model to forward a unique instance key for each prediction input.
In this recipe, you will learn how to modify an existing Tensorflow model by modifying the serving signature to forward a unique instance key for batch predictions. Then upload it for serving on Cloud AI Platform to verify and perform these predictions.
A sample notebook is located at https://github.com/ruiscosta/google-cloud-cookbook
In a Cloud AI Platform notebook, go to the cloned repository, then go to the chapter-8/8-5-tf_batch_predictions folder and open the sample notebook.
Complete the cells through the “SavedModel and serving signature” section. This builds a simple Tensorflow model with the Keras API to predicted hand-written images and saves it to the ‘./model/’ directory in the SavedModel format.
The following bash command will show the serving signature and the inputs/outputs of the saved model:
!saved_model_cli show --tag_set serve --signature_def serving_default --dir {MODEL_EXPORT_PATH} The given SavedModel SignatureDef contains the following input(s): inputs['image'] tensor_info: dtype: DT_FLOAT shape: (-1, 28, 28) name: serving_default_image:0 The given SavedModel SignatureDef contains the following output(s): outputs['preds'] tensor_info: dtype: DT_FLOAT shape: (-1, 10) name: StatefulPartitionedCall:0 Method name is: tensorflow/serving/predict Load the SavedModel MODEL_EXPORT_PATH = './model/' loaded_model = tf.keras.models.load_model(MODEL_EXPORT_PATH) You can test the existing inference function off of the loaded model that corresponds with the ‘serving_default’ signature inference_function = loaded_model.signatures['serving_default'] result = inference_function(tf.convert_to_tensor(test_image)) {'preds': <tf.Tensor: shape=(1, 10), dtype=float32, numpy= array([[1.9574834e-05, 1.7343391e-06, 1.5372832e-05, 6.3454769e-05, 4.5845241e-05, 4.0783577e-02, 1.1227881e-04, 4.5515549e-01, 1.0713221e-02, 4.9308944e-01]], dtype=float32)>}
Now you’ll create a new serving function that accepts and outputs a unique instance key. You’ll use the fact that a Keras Model(x) call actually runs a prediction.
@tf.function(input_signature=[tf.TensorSpec([None], dtype=tf.string),tf.TensorSpec([None, 28, 28], dtype=tf.float32)]) def keyed_prediction(key, image): pred = loaded_model(image, training=False) return { 'preds': pred, 'key': key }
Next, save the new model and inspect the new signature:
KEYED_EXPORT_PATH = './keyed_model/' loaded_model.save(KEYED_EXPORT_PATH, signatures={'serving_default': keyed_prediction}) !saved_model_cli show --tag_set serve --signature_def serving_default --dir {KEYED_EXPORT_PATH}
The given SavedModel SignatureDef contains the following input(s):
inputs['image'] tensor_info: dtype: DT_FLOAT shape: (-1, 28, 28) name: serving_default_image:0 inputs['key'] tensor_info: dtype: DT_STRING shape: (-1) name: serving_default_key:0
The given SavedModel SignatureDef contains the following output(s):
outputs['key'] tensor_info: dtype: DT_STRING shape: (-1) name: StatefulPartitionedCall:0 outputs['preds'] tensor_info: dtype: DT_FLOAT shape: (-1, 10) name: StatefulPartitionedCall:1 Method name is: tensorflow/serving/predict
You can create an AI Platform model as follows:
%%bash MODEL_NAME=fashion_mnist MODEL_VERSION=v1 TFVERSION=2.1 # REGION and BUCKET and MODEL_LOCATION set earlier gcloud ai-platform models create ${MODEL_NAME} --regions $REGION gcloud ai-platform versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --staging-bucket gs://${BUCKET} --runtime-version $TFVERSION
Lastly, examine and upload a file for batch predictions:
!cat images.json !gcloud ai-platform predict --model fashion_mnist --json-instances keyed_input.json --version v1 --signature-name serving_default KEY PREDS image_id_1234 [1.9574799807742238e-05, 1.7343394347335561e-06, 1.537282150820829e-05, 6.345478323055431e-05, 4.584524504025467e-05, 0.0407835878431797, 0.00011227882350794971, 0.4551553726196289, 0.010713225230574608, 0.493089497089386]
Forwarding instance keys or even individual features comes in handy in any instance where a given prediction needs to be tracked, for example when performing predictions in a batch setting or when recording predictions in a truth database for continuous evaluation of the model.
You need to build a more advanced, hand-tuned ML model in Python using data stored in BigQuery. Additionally, you plan to use an architecture not supported by BQML and AutoML is out of scope.
In this recipe, you will learn how to easily extract data from BigQuery using both tf.data for large datasets or the BigQuery client to extract an in-memory dataset into a Pandas DataFrame.
In a Cloud AI Platform notebook, go to the cloned repository, then go to the chapter-8/8-8-bq_data_tf_model folder and open the sample notebook.
The first option to extract data from BigQuery into memory is using the built-in BigQuery magic and pass the name of the dataframe to output the query, in this case df_from_magic:
%%bigquery df_from_magic --use_bqstorage_api SELECT * FROM bigquery-public-data.london_bicycles.cycle_hire WHERE EXTRACT(YEAR from start_date) = 2017 AND EXTRACT(MONTH from start_date) = 1
Alternatively you can load the BigQuery client directly and query that way:
Run the following code in your notebook:
from google.cloud import bigquery client = bigquery.Client() query_string = """ SELECT duration, start_station_id, EXTRACT(DAYOFWEEK from start_date) as day_of_week, EXTRACT(HOUR from start_date) as hour FROM bigquery-public-data.london_bicycles.cycle_hire WHERE EXTRACT(YEAR from start_date) = 2017 AND EXTRACT(MONTH from start_date) = 1 """ df = client.query(query_string).to_dataframe()
Next, build a tf.data object from the DataFrame to then feed into your model:
Run the following code:
import tensorflow as tf target = df.pop('duration') dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values)) for feat, targ in dataset.take(5): print ('Features: {}, Target: {}'.format(feat, targ)) train_dataset = dataset.shuffle(len(df)).batch(64).prefetch(1) # 1=AUTOTUNE
Build and train your model with the tf.dataset:
from tensorflow import keras # Simple model shown for simplicity and using the tf.data API model = keras.Sequential([ tf.keras.layers.Dense(10, activation='relu'), tf.keras.layers.Dense(10, activation='relu'), tf.keras.layers.Dense(1) ]) model.compile(optimizer='adam', loss='mean_absolute_error') model.fit(train_dataset, epochs=2)
If you are building a python-based model outside of BigQuery or AutoML, and the dataset can fit in-memory (you can easily add large quantities of memory to temporary GCE VMs), extracting a BQ training set to a Pandas DataFrame is an excellent way to feed your model. In the case of Tensorflow, tf.data.Dataset.from_tensor_slices() will accept dataframes. There are other methods for other frameworks. If your dataset is larger than what can be stored in-memory, it is recommended to export the data to multiple files on GCS, either CSV format or the highly optimized TFRecord format.
18.205.59.250