Chapter 5

Monte Carlo methods are applied only for episodic tasks whereas TD learning can be applied to both episodic and nonepisodic tasks
The difference between the actual value and the predicted value is called TD error
Refer section TD prediction and TD control
Refer section Solving taxi problem using Q learning
In Q learning, we take action using an epsilon-greedy policy and, while updating the Q value, we simply pick up the maximum action. In SARSA, we take the action using the epsilon-greedy policy and also, while updating the Q value, we pick up the action using the epsilon-greedy policy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.16.139.8

Table of Contents for Chapter 5