Architecture of DARQN

The architecture of DARQN is shown as follows:

It consists of three layers; convolutional, attention, and LSTM recurrent layers. The game screen is fed as the image to the convolutional network. The convolutional network processes the image and produces the feature maps. The feature maps then feed into the attention layer. The attention layer transforms them into a vector and results in their linear combination, called context vectors. The context vectors, along with previous hidden states, are then passed to the LSTM layer. The LSTM layer gives two outputs; in one, it gives the Q value for deciding what action to perform in a state, and in the other, it helps the attention network decide what region of the image to focus on in the next time step so that better context vectors can be generated.

The attention is of two types:

Soft attention: We know that feature maps produced by the convolutional layer are fed as an input to the attention layer, which then produces the context vector. With soft attention, these context vectors are simply the weighted average of all the output (feature maps) produced by the convolutional layer. Weights are chosen according to the relative importance of the features.
Hard attention: With hard attention, we focus only on the particular location of an image at a time step t according to some location selection policy π. This policy is represented by a neural network whose weights are the policy parameters and the output of the network is the location selection probability. However, hard attentions are not much better than soft attentions.

Table of Contents for Architecture of DARQN

Create new playlist

Sign In

Sign Up

Table of Contents for
Architecture of DARQN