Chapter 5

  1. Monte Carlo methods are applied only for episodic tasks whereas TD learning can be applied to both episodic and nonepisodic tasks
  2. The difference between the actual value and the predicted value is called TD error
  3. Refer section TD prediction and TD control
  4. Refer section Solving taxi problem using Q learning
  5. In Q learning, we take action using an epsilon-greedy policy and, while updating the Q value, we simply pick up the maximum action. In SARSA, we take the action using the epsilon-greedy policy and also, while updating the Q value, we pick up the action using the epsilon-greedy policy.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.185.196