How to do it...

We proceed with the recipe as follows:

  1. The first step would be to install Azure CLI. The details for installing Azure CLI on different OS platforms can be obtained from here https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest.
  2. Before creating a cluster you would require to login in Azure, using command az login. It will generate a code and provide you with a website address, where you will be asked to verify your credentials, once all the steps on the website are over, you will be asked to close, and your az credentials will be verified.
  3.  Configure the default location, create and configure a resource group.
az group create --name myResourceGroup --location eastus
az configure --defaults group=myResourceGroup
az configure --defaults location=eastus
  1. Next, we will require creating a storage using <az storage account create> command, and set the environmental variables depending upon your OS, details about environment variables and their values are given at https://docs.microsoft.com/en-us/azure/batch-ai/quickstart-cli
  1. Download and extract the preprocessed MNIST database
wget "https://batchaisamples.blob.core.windows.net/samples/mnist_dataset_original.zip?st=2017-09-29T18%3A29%3A00Z&se=2099-12-31T08%3A00%3A00Z&sp=rl&sv=2016-05-31&sr=b&sig=Qc1RA3zsXIP4oeioXutkL1PXIrHJO0pHJlppS2rID3I%3D" -O mnist_dataset_original.zip
unzip mnist_dataset_original.zip
  1. Download the mnist_replica
wget "https://raw.githubusercontent.com/Azure/BatchAI/master/recipes/TensorFlow/TensorFlow-GPU-Distributed/mnist_replica.py?token=AcZzrcpJGDHCUzsCyjlWiKVNfBuDdkqwks5Z4dPrwA%3D%3D" -O mnist_replica.py
  1. Next create an Azure File Share, where you upload the downloaded MNIST dataset and mnist_replica.py files.
az storage share create --name batchaisample
az storage directory create --share-name batchaisample --name mnist_dataset
az storage file upload --share-name batchaisample --source t10k-images-idx3-ubyte.gz --path mnist_dataset
az storage file upload --share-name batchaisample --source t10k-labels-idx1-ubyte.gz --path mnist_dataset
az storage file upload --share-name batchaisample --source train-images-idx3-ubyte.gz --path mnist_dataset
az storage file upload --share-name batchaisample --source train-labels-idx1-ubyte.gz --path mnist_dataset
az storage directory create --share-name batchaisample --name tensorflow_samples
az storage file upload --share-name batchaisample --source mnist_replica.py --path tensorflow_samples
  1. Now we create a cluster, for this recipe, the cluster consists of two GPU nodes of standard_NC6 size, with standard Ubuntu LTS or Ubuntu DVSM. The cluster can be created using Azure CLI command:

For Linux:

az batchai cluster create -n nc6 -i UbuntuDSVM -s Standard_NC6 --min 2 --max 2 --afs-name batchaisample --afs-mount-path external -u $USER -k ~/.ssh/id_rsa.pub

For Windows:

az batchai cluster create -n nc6 -i UbuntuDSVM -s Standard_NC6 --min 2 --max 2 --afs-name batchaisample --afs-mount-path external -u <user_name> -p <password>
  1. The next step is creating job creation parameters in job.json file:
{
"properties": {
"nodeCount": 2,
"tensorFlowSettings": {
"parameterServerCount": 1,
"workerCount": 2,
"pythonScriptFilePath": "$AZ_BATCHAI_INPUT_SCRIPT/mnist_replica.py",
"masterCommandLineArgs": "--job_name=worker --num_gpus=1 --ps_hosts=$AZ_BATCHAI_PS_HOSTS --worker_hosts=$AZ_BATCHAI_WORKER_HOSTS --task_index=$AZ_BATCHAI_TASK_INDEX --data_dir=$AZ_BATCHAI_INPUT_DATASET --output_dir=$AZ_BATCHAI_OUTPUT_MODEL",
"workerCommandLineArgs": "--job_name=worker --num_gpus=1 --ps_hosts=$AZ_BATCHAI_PS_HOSTS --worker_hosts=$AZ_BATCHAI_WORKER_HOSTS --task_index=$AZ_BATCHAI_TASK_INDEX --data_dir=$AZ_BATCHAI_INPUT_DATASET --output_dir=$AZ_BATCHAI_OUTPUT_MODEL",
"parameterServerCommandLineArgs": "--job_name=ps --num_gpus=0 --ps_hosts=$AZ_BATCHAI_PS_HOSTS --worker_hosts=$AZ_BATCHAI_WORKER_HOSTS --task_index=$AZ_BATCHAI_TASK_INDEX --data_dir=$AZ_BATCHAI_INPUT_DATASET --output_dir=$AZ_BATCHAI_OUTPUT_MODEL"
},
"stdOutErrPathPrefix": "$AZ_BATCHAI_MOUNT_ROOT/external",
"inputDirectories": [{
"id": "DATASET",
"path": "$AZ_BATCHAI_MOUNT_ROOT/external/mnist_dataset"
}, {
"id": "SCRIPT",
"path": "$AZ_BATCHAI_MOUNT_ROOT/external/tensorflow_samples"
}],
"outputDirectories": [{
"id": "MODEL",
"pathPrefix": "$AZ_BATCHAI_MOUNT_ROOT/external",
"pathSuffix": "Models"
}],
"containerSettings": {
"imageSourceRegistry": {
"image": "tensorflow/tensorflow:1.1.0-gpu"
}
}
}
}
  1. Finally, create the Batch AI job using the command:
az batchai job create -n distibuted_tensorflow --cluster-name nc6 -c job.json
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.166.255