Implementing a Q-learning agent using Python and NumPy

Let's begin implementing our Q-learning agent by implementing the Q_Learner class. The main methods of this class are the following:

__init__(self, env)
discretize(self, obs)
get_action(self, obs)
learn(self, obs, action, reward, next_obs)

You will later find that the methods in here are common and exist in almost all the agents we will be implementing in this book. This makes it easy for you to grasp them, as these methods will be repeated (with some modifications) again and again.

The discretize() function is not necessary for agent implementations in general, but when the state space is large and continuous, it may be better to discretize the space into countable bins or ranges of values to simplify the representation. This also reduces the number of values that the Q-learning algorithm needs to learn, as it now only has to learn a finite set of values, which can be concisely represented in tabular formats or by using n-dimensional arrays instead of complex functions. Moreover, the Q-learning algorithm, used for optimal control, is guaranteed to converge for tabular representations of Q-values.

Table of Contents for Implementing a Q-learning agent using Python and NumPy

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing a Q-learning agent using Python and NumPy