Deep Q learning from demonstrations

We have learned a lot about DQN. We started off with vanilla DQN and then we saw various improvements such as double DQN, dueling network architecture, and prioritized experience replay. We have also learned to build DQN to play Atari games. We stored the agent's interactions with the environment in the experience buffer and made the agent learn from those experiences. But the problem was, it took us a lot of training time to improve performance. For learning in simulated environments, it is fine, but when we make our agent learn in a real-world environment it causes a lot of problems. To overcome this, a researcher from Google's DeepMind introduced an improvement on DQN called deep Q learning from demonstrations (DQfd).

If we already have some demonstration data, then we can directly add those demonstrations to the experience replay buffer. For example, consider an agent learning to play Atari games. If we already have some demonstration data that tells our agent which state is better and which action provides a good reward in a state, then the agent can directly make use of this data for learning. Even a small amount of demonstration will increase the agent's performance and also minimizes the training time. Since the demonstrated data will be added directly to the prioritized experience replay buffer, the amount of data the agent can use from the demonstration data and the amount of data the agent can use from its own interaction for learning will be controlled by the prioritized experience replay buffer, as the experience will be prioritized.

Loss functions in DQfd will be the sum of various losses. In order to prevent our agent from overfitting to the demonstration data, we compute L2 regularization loss over the network weights. We compute TD loss as usual and also supervised loss to see how our agent is learning from the demonstration data. Authors of this paper experimented with DQfd and various environments, and the performance of DQfd was better and faster than prioritized dueling Double DQN.

You can check out this video to see how DQfd learned to play the Private Eye game: https://youtu.be/4IFZvqBHsFY.

Table of Contents for Deep Q learning from demonstrations

Create new playlist

Sign In

Sign Up

Table of Contents for
Deep Q learning from demonstrations