Setting up TensorBoard for analyzing TPU performance

Analyzing the performance of any application is extremely critical, and TensorBoard helps in visualizing and analyzing performance on Cloud TPU. Using TensorBoard, you can not only monitor your application but also improve its performance by applying suggestions provided by TensorBoard.

After setting up Cloud TPU, you should install the latest version of the Cloud TPU profiler to create a capture-tpu-profile script. The following are the steps to run TensorBoard:

  1. Open a new Cloud Shell to start TensorBoard.
  2. Run the following command to set the required environment variables and create environment variables for your cloud storage bucket and model directory. The model directory variable (MODEL_DIR) contains the name of the GCP directory where checkpoints, summaries, and the TensorBoard output are stored during model training. The following code shows the commands with the parameters that can be used for setting up TensorBoard:
$ ctpu up ctpu up --name=[Your TPU Name] --zone=[Your TPU Zone]

$ export STORAGE_BUCKET=gs://YOUR STORAGE BUCKET NAME
$ export MODEL_DIR=${STORAGE_BUCKET}/MODEL DIRECTORY

TensorBoard trace can be viewed in two ways:

  • The static trace viewer
  • The streaming trace viewer
If you need more than one million events per TPU, you will have to use the streaming trace viewer.

Let's check how to enable the static trace viewer.

  1. Run the following command in the same Cloud Shell that was used to set the environmental variables, which were set in the preceding step:
$tensorboard --logdir=${MODEL_DIR} &
  1. In the same Cloud Shell, at the top, click on Web Preview and open port 8080 to view the TensorBoard output. To capture output from the command line, run the following command:
$ capture_tpu_profile --tpu=[YOUR TPU NAME] --logdir=${MODEL_DIR}

TensorBoard provides the following features:

  • TensorBoard provides various options to visualize and analyze performance.
  • You can visualize graphs and utilize the Profiler to improve the performance of your application.
  • The XLA structure graph and the TPU compatibility graph are very useful for analysis.
  • There are some profiler options as well, such as the overview page, input pipeline analyzer, XLA Op profile, trace viewer (Chrome browser only), memory viewer, pod viewer, and streaming trace viewer (Chrome browser only). These are very useful for analyzing performance and tuning your applications.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.152.157