Nesterov momentum

In Nesterov momentum, we are changing where/when we compute the gradient. We make a big jump in the direction of the previously accumulated gradient. Then, we measure the gradient at this new position and make a correction/update accordingly.

This correction prevents the ordinary momentum algorithm from updating too quickly, hence producing fewer oscillations as the gradient descent tries to converge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.176