In the project repository, we have already provided a package named production. In that package, we need to copy the labels.txt file into our dataset, create a new Python file, client.py, and add the following code:
import tensorflow as tf import numpy as np from tensorflow_serving.apis import prediction_service_pb2, predict_pb2 from grpc.beta import implementations from scipy.misc import imread from datetime import datetime class Output: def __init__(self, score, label): self.score = score self.label = label def __repr__(self): return "Label: %s Score: %.2f" % (self.label, self.score) def softmax(x): return np.exp(x) / np.sum(np.exp(x), axis=0) def process_image(path, label_data, top_k=3): start_time = datetime.now() img = imread(path) host, port = "0.0.0.0:9000".split(":") channel = implementations.insecure_channel(host, int(port)) stub = prediction_service_pb2.beta_create_PredictionService_stub(channel) request = predict_pb2.PredictRequest() request.model_spec.name = "pet-model" request.model_spec.signature_name = "predict_images" request.inputs["images"].CopyFrom( tf.contrib.util.make_tensor_proto( img.astype(dtype=float), shape=img.shape, dtype=tf.float32 ) ) result = stub.Predict(request, 20.) scores = tf.contrib.util.make_ndarray(result.outputs["scores"])[0] probs = softmax(scores) index = sorted(range(len(probs)), key=lambda x: probs[x], reverse=True) outputs = [] for i in range(top_k): outputs.append(Output(score=float(probs[index[i]]), label=label_data[index[i]])) print(outputs) print("total time", (datetime.now() - start_time).total_seconds()) return outputs if __name__ == "__main__": label_data = [line.strip() for line in open("production/labels.txt", 'r')] process_image("samples_data/dog.jpg", label_data) process_image("samples_data/cat.jpg", label_data)
In this code, we create a process_image method that will read the image from an image path and use some TensorFlow methods to create a tensor and send it to the model server with gRPC. We also create an Output class so that we can easily return it to the caller method. At the end of the method, we print the output and the total time so that we can debug it more easily. We can run this Python file to see if the process_image works:
python production/client.py
The output should look like this:
[Label: saint_bernard Score: 0.78, Label: american_bulldog Score: 0.21, Label: staffordshire_bull_terrier Score: 0.00] ('total time', 14.943942) [Label: Maine_Coon Score: 1.00, Label: Ragdoll Score: 0.00, Label: Bengal Score: 0.00] ('total time', 14.918235)
We get the correct result. However, the time to process is almost 15 seconds for each image. The reason is that we are using TensorFlow Serving in CPU mode. As we mentioned earlier, you can build TensorFlow Serving with GPU support in Appendix A, Advanced Installation. If you follow that tutorial, you will have the following result:
[Label: saint_bernard Score: 0.78, Label: american_bulldog Score: 0.21, Label: staffordshire_bull_terrier Score: 0.00] ('total time', 0.493618) [Label: Maine_Coon Score: 1.00, Label: Ragdoll Score: 0.00, Label: Bengal Score: 0.00] ('total time', 0.023753)
The time to process in the first calling time is 493 ms. However, the later calling time will be only about 23 ms, which is so much quicker than the CPU version.