Implementing the experience memory

Let's implement an experience memory class to store the experiences collected by the agent. Before that, let's cement our understanding of what we mean by experience. In reinforcement learning where the problems are represented using Markov Decision Processes (MDP), which we discussed in Chapter 2Reinforcement Learning and Deep Reinforcement Learning, it is efficient to represent one experience as a data structure that consists of the observation at time step t, the action taken following that observation, the reward received for that action, and the next observation (or state) that the environment transitioned to due to the agent's action. It is useful to also include the "done" Boolean value that signifies whether this particular next observation marked the end of the episode or not. Let's use Python's namedtuple from the collections library to represent such a data structure, as shown in the following code snippet:

from collections import namedtuple
Experience = namedtuple("Experience", ['obs', 'action', 'reward', 'next_obs',
'done'])

The namedtuple data structure makes it convenient to access the elements using a name attribute (like 'obs', 'action', and so on) instead of a numerical index (like 0, 1 and so on).

We can now move on to implement the experience memory class using the experience data structure we just created. To figure out what methods we need to implement in the experience memory class, let's think about how we will be using it later. 

First, we want to be able to store new experiences in the experience memory that the agent collects. Then, we want to sample or retrieve experiences in batches from the experience memory when we want to replay to update the Q-function. So, essentially, we will need a method that can store new experiences and a method that can sample a single or a batch of experiences. 

Let's dive into the experience memory implementation, starting with the initialization method where we initialize the memory with the desired capacity, as follows:

class ExperienceMemory(object):
"""
A cyclic/ring buffer based Experience Memory implementation
"""
def __init__(self, capacity=int(1e6)):
"""
:param capacity: Total capacity (Max number of Experiences)
:return:
"""
self.capacity = capacity
self.mem_idx = 0 # Index of the current experience
self.memory = []

The mem_idx member variable will be used to point to the current writing head or the index location where we will be storing new experiences when they arrive.

A "cyclic buffer" is also known by other names that you may have heard of: "circular buffer", "ring buffer", and "circular queue". They all represent the same underlying data structure that uses a ring-like fixed-size data representation.

Next, we'll look at the store method's implementation: 

def store(self, experience):
"""
:param experience: The Experience object to be stored into the memory
:return:
"""
self.memory.insert(self.mem_idx % self.capacity, experience)
self.mem_idx += 1

Simple enough, right? We are storing the experience at mem_idx, like we discussed.

The next code is our sample method implementation:

import random
def sample(self, batch_size):
"""

:param batch_size: Sample batch_size
:return: A list of batch_size number of Experiences sampled at random from mem
"""
assert batch_size <= len(self.memory), "Sample batch_size is more than available exp in mem"
return random.sample(self.memory, batch_size)

In the preceding code, we make use of Python's random library to uniformly sample experiences from the experience memory at random. We will also implement a simple get_size helper method, which we will use to find out how many experiences are already stored in the experience memory:

def get_size(self):
"""

:return: Number of Experiences stored in the memory
"""
return len(self.memory)

The full implementation of the experience memory class is available at ch6/utils/experience_memory.py, in this book's code repository.

Next, we'll look at how we can replay experiences sampled from the experience memory to update the agent's Q-function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.51.3