Momentum

When thinking about optimization of gradient descent, we can certainly use intuition from real life to help to inform our methods. One example of this is momentum. If we imagine that most error gradients are really like a bowl, with the desired point in the middle, if we start from the highest point of the bowl, it could take us a long time to get to the bottom of the bowl.

If we think about some real-life physics, the steeper the side of the bowl, the quicker a ball would fall along the side as it gained momentum. Taking this as inspiration, we get what we can consider the momentum variation of SGD; we try to help to accelerate the descent down the gradient by considering that, if the gradient continues to go down the same direction, we give it more momentum. Alternatively, if we found that the gradient was changing direction, we'd reduce the amount of momentum.

While we don't want to get bogged down in heavy maths, there is a simple formula to calculate momentum. It is as follows:

V = momentum * m - lr * g

Here, m is the previous weight update, g is the current gradient with respect to parameter p, lr is the learning rate of our solver, and momentum is a constant.

So, if we want to understand exactly how to update our network parameters, we can adjust the formula in the following way:

P(new) = p + v = p + momentum * m - lr * g

What does this mean in practice? Let's look at some code.

Firstly, in Gorgonia, the basic interface for all optimization methods or solvers looks like this:

type Solver interface {
Step([]ValueGrad) error
}

We then have the following function that provides construction options for a Solver:

type SolverOpt func(s Solver)

The primary option to set is, of course, to use momentum itself; the SolverOpt option for this is WithMomentum. Solver options that apply include WithL1Reg, WithL2Reg, WithBatchSize, WithClip, and WithLearnRate.

Let's use our code example from the beginning of this chapter, but, instead of vanilla SGD, let's use the momentum solver in its most basic form, as follows:

vm := NewTapeMachine(g, BindDualValues(m.learnables()...))
solver := NewMomentum()

That's it! But that doesn't tell us much, just that Gorgonia is, like any good machine learning library, flexible and modular enough that we can simply swap out our solvers (and measure relative performance!).

So, let's take a look at the function we are calling, as shown in the following code:

func NewMomentum(opts ...SolverOpt) *Momentum {
s := Momentum{
eta: 0.001,
momentum: 0.9,
}
for _, opt := range opts {
opt(s)
}
return s
}

We can see here the momentum constant we referenced in the original formula for this method, together with eta, which is our learning rate. This is all we need to do; apply the momentum solver to our model!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.79.206