Experience replay

We know that in RL environments, we make a transition from one state s to the next state s' by performing some action a and receive a reward r. We save this transition information as in a buffer called a replay buffer or experience replay. These transitions are called the agent's experience. 

The key idea of experience replay is that we train our deep Q network with transitions sampled from the replay buffer instead of training with the last transitions. Agent's experiences are correlated one at a time, so selecting a random batch of training samples from the replay buffer will reduce the correlation between the agent's experience and helps the agent to learn better from a wide range of experiences.

Also, neural networks will overfit with correlated experience, so by selecting a random batch of experiences from reply buffer we will reduce the overfitting. We can use uniform sampling for sampling the experience. We can think of experience replay as a queue rather than a list. A replay buffer will store only a fixed number of recent experiences, so when the new information comes in, we delete the old:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.59.192