Target network

In our loss function, we calculate the squared difference between a target and predicted value:

We are using the same Q function for calculating the target value and the predicted value. In the preceding equation, you can see the same weights are used for both target Q and predicted Q. Since the same network is calculating the predicted value and target value, there could be a lot of divergence between these two.

To avoid this problem, we use a separate network called a target network for calculating the target value. So, our loss function becomes:

You may notice that the parameter of target Q is instead of . Our actual Q network, which is used for predicting Q values, learns the correct weights of by using gradient descent. The target network is frozen for several time steps and then the target network weights are updated by copying the weights from the actual Q network. Freezing the target network for a while and then updating its weights with the actual Q network weights stabilizes the training.

Table of Contents for Target network

Create new playlist

Sign In

Sign Up

Table of Contents for
Target network