Synchronous/asynchronous SGD

As mentioned before, in data parallelism, each model will grab some data from the training set and calculate their own gradient, but somehow we need to synchronize a way before updating the model, given that each worker will have the same model.

In synchronous SGD, all workers calculate a gradient and wait to have all gradients calculated, then the model is updated and distributed to all the workers again:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.