Implementing an epsilon decay schedule

We can decay (or decrease) the value linearly (in the following left-hand side graph), exponentially (in the following right-hand side graph) or using some other decay schedule. Linear and exponential schedules are the most commonly used decay schedules for the exploration parameter :

In the preceding graphs, you can see how the epsilon (exploration) value varies with the different schedule schemes (linear on the left graph, exponential on the right graph). The decay schedule shown in the preceding graphs use an epsilon_max (start) value of 1, epsilon_min (final) value of 0.01 in the linear case, and exp(-10000/2000) in the exponential case, with both of them maintaining a constant value of epsilon_min after 10,000 episodes.

The following code implements the LinearDecaySchedule, which we will use for our Deep_Q_Learning agent implementation to play Atari games:

#!/usr/bin/env python

class LinearDecaySchedule(object):
    def __init__(self, initial_value, final_value, max_steps):
        assert initial_value > final_value, "initial_value should be < final_value"
        self.initial_value = initial_value
        self.final_value = final_value
        self.decay_factor = (initial_value - final_value) / max_steps

    def __call__(self, step_num):
        current_value = self.initial_value - self.decay_factor * step_num
        if current_value < self.final_value:
            current_value = self.final_value
        return current_value

if __name__ == "__main__":
    import matplotlib.pyplot as plt
    epsilon_initial = 1.0
    epsilon_final = 0.05
    MAX_NUM_EPISODES = 10000
    MAX_STEPS_PER_EPISODE = 300
    linear_sched = LinearDecaySchedule(initial_value = epsilon_initial,
                                    final_value = epsilon_final,
                                    max_steps = MAX_NUM_EPISODES * MAX_STEPS_PER_EPISODE)
    epsilon = [linear_sched(step) for step in range(MAX_NUM_EPISODES * MAX_STEPS_PER_EPISODE)]
    plt.plot(epsilon)
    plt.show()

The preceding script is available at ch6/utils/decay_schedule.py in this book's code repository. If you run the script, you will see that the main function creates a linear decay schedule for epsilon and plots the value. You can experiment with different values of MAX_NUM_EPISODES, MAX_STEPS_PER_EPISODE, epsilon_initial, and epsilon_final to visually see how the epsilon values vary with the number of steps. In the next section, we will implement the get_action(...) method which implements the -greedy action selection policy.

Table of Contents for Implementing an epsilon decay schedule

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing an epsilon decay schedule