Double DQN

Deep Q learning is pretty cool, right? It has generalized its learning to play any Atari game. But the problem with DQN is that it tends to overestimate Q values. This is because of the max operator in the Q learning equation. The max operator uses the same value for both selecting and evaluating an action. What do I mean by that? Let's suppose we are in a state s and we have five actions a₁ to a₅. Let's say a₃ is the best action. When we estimate Q values for all these actions in the state s, the estimated Q values will have some noise and differ from the actual value. Due to this noise, action a₂ will get a higher value than the optimal action a₃. Now, if we select the best action as the one that has maximum value, we will end up selecting a suboptimal action a₂instead of optimal action a₃.

We can solve this problem by having two separate Q functions, each learning independently. One Q function is used to select an action and the other Q function is used to evaluate an action. We can implement this by just tweaking the target function of DQN. Recall the target function of DQN:

We can modify our target function as follows:

In the preceding equation, we have two Q functions each with different weights. So a Q function with weights is used to select the action and the other Q function with weights is used to evaluate the action. We can also switch the roles of these two Q functions.

Table of Contents for Double DQN

Create new playlist

Sign In

Sign Up

Table of Contents for
Double DQN