Advanced gradient descent algorithms

Now that we have an understanding of SGD and backpropagation, let's look at a number of advanced optimization methods (building on SGD) that offer us some kind of advantage, usually an improvement in training time (or the time it takes to minimize the cost function to the point where our network converges).

These improved methods include a general notion of velocity as an optimization parameter. Quoting from Wibisono and Wilson, in the opening to their paper on Accelerated Methods in Optimization:

"In convex optimization, there is an acceleration phenomenon in which we can boost the convergence rate of certain gradient-based algorithms."

In brief, a number of these advanced algorithms all rely on a similar principle—that they can pass through local optima quickly, carried by their momentum—essentially, a moving average of our gradients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.35