Policy Gradients and Optimization

In the last three chapters, we have learned about various deep reinforcement learning algorithms, such as Deep Q Network (DQN), Deep Recurrent Q Network (DRQN), and the Asynchronous Advantage Actor Critic (A3C) network. In all the algorithms, our goal is to find the correct policy so that we can maximize the rewards. We use the Q function to find the optimal policy as the Q function tells us which action is the best action to perform in a state. Do you think we can directly find the optimal policy without using Q function? Yes. We can. In policy gradient methods, we can find the optimal policy without using the Q function.

In this chapter, we will learn about policy gradients in detail. We will also look at different types of policy gradient methods such as deep deterministic policy gradients followed by state-of-the-art policy optimization methods such as trust region policy optimization and proximal policy optimization. 

In this chapter, you will learn the following:

  • Policy gradients
  • Lunar lander using policy gradients
  • Deep deterministic policy gradients
  • Swinging a pendulum using the deep deterministic policy gradient (DDPG)
  • Trust region policy optimization
  • Proximal policy optimization
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.120.136