Distributional RL

The distributional RL method (https://arxiv.org/abs/1707.06887) is about learning to approximate the distribution of returns rather than the expected (average) return. The distributional RL method proposes the use of probability masses placed on a discrete support to model such distributions. This, in essence, means that rather than trying to model one action-value given the state, a distribution of action-values for each action given the state is sought. Without going too much into the details (as that would require a lot of background information), we will look at one of the key contributions of this method to RL in general, which is the formulation of the Distributional Bellman equation. As you may recall from the previous chapters of this book, the action-value function, using a one-step Bellman backup for it, can be returned as follows:

In the case of Distributional Bellman equations, the scalar quantity is replaced by a random variable , which gives us the following equation:

Because the quantity is no longer a scalar, the update equation needs to be dealt with more car than just adding the discounted value of the next state-action-value to the step-return. The update step of the distributional bellman equation can be understood easily with the help of the following diagram (stages from left to right):

In the previous illustration, the distribution of the next state action-value is depicted in red on the left, which is then scaled by the discount factor (middle), and finally the distribution is shifted by to yield the Distributional Bellman update. After the update, the target distribution that results from the previous update operation is projected onto the supports of the current distribution by minimizing the cross entropy loss between and .

With this background, you can briefly glance through the pseudo code of the C51 algorithm from the Distributional RL paper, which is integrated into the Rainbow agent:

Table of Contents for Distributional RL

Create new playlist

Sign In

Sign Up

Table of Contents for
Distributional RL