The epsilon-greedy algorithm

The greedy algorithm in RL is a complete exploitation algorithm, which does not care for exploration. Greedy algorithms always select the action with the highest estimated action value. The action value is estimated according to past experience by averaging the rewards associated with the target action that have been observed so far.

However, use of a greedy algorithm can be a smart approach if we are able to successfully estimate the action value to the expected action value; if we know the true distribution, we can just select the best actions. An epsilon-greedy algorithm is a simple combination of the greedy and random approaches.

Epsilon helps to do this estimate. It adds exploration as part of the greedy algorithm. In order to counter the logic of always selecting the best action, as per the estimated action value, occasionally, the epsilon probability selects a random action for the sake of exploration; the rest of the time, it behaves as the original greedy algorithm and select the best known action.

The epsilon in this algorithm is an adjustable parameter that determines the probability of taking a random, rather than principled, action. It is also possible to adjust the epsilon value during training. Generally, at the start of the training process, the epsilon value is often initialized to a large probability. As the environment is unknown, the large epsilon value encourages exploration. The value is then gradually reduced to a small constant (often set to 0.1). This will increase the rate of exploitation selection.

Due to the simplicity of the algorithm, the approach has become the de facto technique for most recent RL algorithms.
Despite the common usage that the algorithm enjoys, this method is far from optimal, since it takes into account only whether actions are most rewarding or not.

Table of Contents for The epsilon-greedy algorithm

Create new playlist

Sign In

Sign Up

Table of Contents for
The epsilon-greedy algorithm