Q - Learning

Q-learning is a policy based reinforcement learning technique where the goal of Q learning is to learn an optimal policy which helps an agent decide under which circumstances of the environment which action to take. 

To implement Q -learning you need to understand what a Q function is.

Q function accepts a state and a corresponding action as input and yields the total expected reward. It can be expressed as Q(s, a). When at state s, an optimal Q Function indicates to the agent how good of a choice is picking an action a.

For a single state s and an action a, Q(s, a) can be expressed in terms of Q value of the next state s', given by 

This is known as the Bellman equation. It tells us that the maximum reward is the sum of the reward the agent received for entering the current state s and the discounted maximum future reward for the next state s'.

Below is the Pseudocode for the Q learning algorithm from the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Figure 15.4: Pseudocode for Q learning
Link to the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto http://incompleteideas.net/book/ebook/the-book.html
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.10.1