Implementing an epsilon decay schedule

We can decay (or decrease) the  value linearly (in the following left-hand side graph), exponentially (in the following right-hand side graph) or using some other decay schedule. Linear and exponential schedules are the most commonly used decay schedules for the exploration parameter :

In the preceding graphs, you can see how the epsilon (exploration) value varies with the different schedule schemes (linear on the left graph, exponential on the right graph). The decay schedule shown in the preceding graphs use an epsilon_max (start) value of 1, epsilon_min (final) value of 0.01 in the linear case, and exp(-10000/2000) in the exponential case, with both of them maintaining a constant value of epsilon_min after 10,000 episodes.

The following code implements the LinearDecaySchedule, which we will use for our Deep_Q_Learning agent implementation to play Atari games:

#!/usr/bin/env python

class LinearDecaySchedule(object):
def __init__(self, initial_value, final_value, max_steps):
assert initial_value > final_value, "initial_value should be < final_value"
self.initial_value = initial_value
self.final_value = final_value
self.decay_factor = (initial_value - final_value) / max_steps

def __call__(self, step_num):
current_value = self.initial_value - self.decay_factor * step_num
if current_value < self.final_value:
current_value = self.final_value
return current_value

if __name__ == "__main__":
import matplotlib.pyplot as plt
epsilon_initial = 1.0
epsilon_final = 0.05
MAX_NUM_EPISODES = 10000
MAX_STEPS_PER_EPISODE = 300
linear_sched = LinearDecaySchedule(initial_value = epsilon_initial,
final_value = epsilon_final,
max_steps = MAX_NUM_EPISODES * MAX_STEPS_PER_EPISODE)
epsilon = [linear_sched(step) for step in range(MAX_NUM_EPISODES * MAX_STEPS_PER_EPISODE)]
plt.plot(epsilon)
plt.show()

The preceding script is available at ch6/utils/decay_schedule.py in this book's code repository. If you run the script, you will see that the main function creates a linear decay schedule for epsilon and plots the value. You can experiment with different values of MAX_NUM_EPISODES, MAX_STEPS_PER_EPISODE, epsilon_initial, and epsilon_final to visually see how the epsilon values vary with the number of steps. In the next section, we will implement the get_action(...) method which implements the -greedy action selection policy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.151.141