Replay memory

Now, we build the experience replay buffer, which is used for storing all the agent's experience. We sample a minibatch of experience from the replay buffer for training the network:

class ReplayMemoryFast:

First, we define the __init__ method and initiate the buffer size:


def __init__(self, memory_size, minibatch_size):

# max number of samples to store
self.memory_size = memory_size

# minibatch size
self.minibatch_size = minibatch_size

self.experience = [None]*self.memory_size
self.current_index = 0
self.size = 0

Next, we define the store function for storing the experiences:

 def store(self, observation, action, reward, newobservation, is_terminal):

Store the experience as a tuple (current state, action, reward, next state, is it a terminal state):

        self.experience[self.current_index] = (observation, action, reward, newobservation, is_terminal)
self.current_index += 1
self.size = min(self.size+1, self.memory_size)

If the index is greater than the memory, then we flush the index by subtracting it with memory size:

        if self.current_index >= self.memory_size:
self.current_index -= self.memory_size

Next, we define a sample function for sampling a minibatch of experience:

  def sample(self):
if self.size < self.minibatch_size:
return []

# First we randomly sample some indices
samples_index = np.floor(np.random.random((self.minibatch_size,))*self.size)

# select the experience from the sampled indexed
samples = [self.experience[int(i)] for i in samples_index]

return samples
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.184.102