Summary

We started off with policy gradient methods which directly optimized the policy without requiring the Q function. We learned about policy gradients by solving a Lunar Lander game, and we looked at DDPG, which has the benefits of both policy gradients and Q functions.

Then we looked at policy optimization algorithms such as TRPO, which ensure monotonic policy improvements by enforcing a constraint on KL divergence between the old and new policy is not greater than .

We also looked at proximal policy optimization, which changed the constraint to a penalty by penalizing the large policy update. In the next chapter, Chapter 12Capstone Project – Car Racing Using DQN, we will see how to build an agent to win a car racing game. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.106.7