Policy Gradients and Optimization

In the last three chapters, we have learned about various deep reinforcement learning algorithms, such as Deep Q Network (DQN), Deep Recurrent Q Network (DRQN), and the Asynchronous Advantage Actor Critic (A3C) network. In all the algorithms, our goal is to find the correct policy so that we can maximize the rewards. We use the Q function to find the optimal policy as the Q function tells us which action is the best action to perform in a state. Do you think we can directly find the optimal policy without using Q function? Yes. We can. In policy gradient methods, we can find the optimal policy without using the Q function.

In this chapter, we will learn about policy gradients in detail. We will also look at different types of policy gradient methods such as deep deterministic policy gradients followed by state-of-the-art policy optimization methods such as trust region policy optimization and proximal policy optimization.

In this chapter, you will learn the following:

Policy gradients
Lunar lander using policy gradients
Deep deterministic policy gradients
Swinging a pendulum using the deep deterministic policy gradient (DDPG)
Trust region policy optimization
Proximal policy optimization

Table of Contents for Policy Gradients and Optimization

Create new playlist

Sign In

Sign Up

Table of Contents for
Policy Gradients and Optimization