RMSprop

We can also think about optimization in a different way: what if we adjust the learning rate based on feature importance? We could decrease the learning rate when we are updating parameters on common features and then increase it when we are looking at more uncommon ones. This also means that we can spend less time optimizing the learning rate. There are several variations of this idea that have been proposed, but the most popular by far is called RMSprop.

RMSprop is a modified form of SGD that, while unpublished, is elaborated in Geoffrey Hinton's Neural Networks for Machine Learning. RMSprop sounds fancy, but it could just as easily be called adaptive gradient descent. The basic idea is you modify your learning rate based on certain conditions.

These conditions can be stated simply as follows:

  • If the gradient of the function is small but consistent, then increase the learning rate
  • If the gradient of the function is large but inconsistent, then decrease the learning rate

RMSprop's specific method of doing this is by dividing the learning rate for a weight by a decaying average of the previous gradients.

Gorgonia supports RMSprop natively. As with the momentum example, you simply swap out your solver. Here is how you define it, together with a number of solveropts you would want to pass in:

solver = NewRMSPropSolver(WithLearnRate(stepSize), WithL2Reg(l2Reg), WithClip(clip))

Inspecting the underlying function, we see the following options and their associated defaults for decay factor, smoothing factor, and learning rate, respectively:

func NewRMSPropSolver(opts...SolverOpt) * RMSPropSolver {
s: = RMSPropSolver {
decay: 0.999,
eps: 1e-8,
eta: 0.001,
}

for _,
opt: = range opts {
opt(s)
}
return s
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.35