Distributed computing in TensorFlow

In this section, you will learn how to distribute computation in TensorFlow; the importance of knowing how to do this is highlighted as follows:

  • Run more experiments in parallel (namely, finding hyperparameters, for example, gridsearch)
  • Distribute model training over multiple GPUs (on multiple servers) to reduce training time

One famous use case was when Facebook published a paper that was able to train ImageNet in 1 hour (instead of weeks). Basically, it trained a ResNet-50 on ImageNet on 256 GPUs, distributed on 32 servers, with a batch size of 8,192 images.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.191.134