Q - Learning

Q-learning is a policy based reinforcement learning technique where the goal of Q learning is to learn an optimal policy which helps an agent decide under which circumstances of the environment which action to take.

To implement Q -learning you need to understand what a Q function is.

Q function accepts a state and a corresponding action as input and yields the total expected reward. It can be expressed as Q(s, a). When at state s, an optimal Q Function indicates to the agent how good of a choice is picking an action a.

For a single state s and an action a, Q(s, a) can be expressed in terms of Q value of the next state s', given by

This is known as the Bellman equation. It tells us that the maximum reward is the sum of the reward the agent received for entering the current state s and the discounted maximum future reward for the next state s'.

Below is the Pseudocode for the Q learning algorithm from the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Figure 15.4: Pseudocode for Q learning

Link to the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto http://incompleteideas.net/book/ebook/the-book.html

Table of Contents for Q - Learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Q - Learning