Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Decayed epsilon greedy

The value of epsilon is key in determining how well the epsilon-greedy algorithm works for a given problem. Instead of setting this value at the start and then decreasing it, we can make epsilon dependent on time. For example, epsilon can be kept equal to 1 / log(t + 0.00001). As time passes, the epsilon value will keep reducing. This method works as over the time that epsilon is reduced, we become more confident of the optimal action and less exploring is required.

The problem with the random selection of actions is that after sufficient time steps, even if we know that some arm is bad, this algorithm will keep choosing that with probability epsilon/n. Essentially, we are exploring a bad action, which does not sound very efficient. The approach to get around this could be to favor exploration of arms with strong potential in order to get an optimal value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.191.195.111

Table of Contents for Decayed epsilon greedy

Create new playlist

Sign In

Sign Up

Table of Contents for
Decayed epsilon greedy