How to do it...

Here is how we proceed with the recipe:

  1. Create a new Google Cloud project from the Web console https://pantheon.google.com/cloud-resource-manager

The following screen is displayed when you click on create project:

  1. Enable billing for this project by selecting the related voice on the left bar of the console. Then enable the Compute Engine and Cloud Machine Learning APIs for the project:
  1. Login to the web cloudshell https://pantheon.google.com/cloudshell/editor?
  1. From the console run the following commands for configuring the zone where the computation will be executed, for downloading the sample code and for creating the VMs used for running the code. Finally connect to the machine:
gcloud config set compute/zone us-east1-c
gcloud config set project [YOUR_PROJECT_ID]
git clone https://github.com/GoogleCloudPlatform/cloudml-dist-mnist-example
cd cloudml-dist-mnist-example
gcloud compute instances create template-instance
--image-project ubuntu-os-cloud
--image-family ubuntu-1604-lts
--boot-disk-size 10GB
--machine-type n1-standard-1
gcloud compute ssh template-instance
  1. Now, after logging into the machine we need to setup the environment by installing PIP and TensorFlow with these commands.
sudo apt-get update
sudo apt-get -y upgrade
&& sudo apt-get install -y python-pip python-dev
sudo pip install tensorflow
sudo pip install --upgrade tensorflow
  1. We will have multiple workers operating on MNIST data so best approach is to create a storage bucket shared among all the workers and copy the MNIST data in this bucket
BUCKET="mnist-$RANDOM-$RANDOM"
gsutil mb -c regional -l us-east1 gs://${BUCKET}
sudo ./scripts/create_records.py
gsutil cp /tmp/data/train.tfrecords gs://${BUCKET}/data/
gsutil cp /tmp/data/test.tfrecords gs://${BUCKET}/data/
  1. We are now going to create multiple workers (worker-0, worker-1) which are clones of the initial template-instance machine. We don't want the machines to delete the disks when they are turned off so that's why we have the first command.
gcloud compute instances set-disk-auto-delete template-instance 
--disk template-instance --no-auto-delete
gcloud compute instances delete template-instance
gcloud compute images create template-image
--source-disk template-instance
gcloud compute instances create
master-0 worker-0 worker-1 ps-0
--image template-image
--machine-type n1-standard-4
--scopes=default,storage-rw
  1. The final step is to run the computation for distributed training.
./scripts/start-training.sh gs://${BUCKET}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.70.88