In TD prediction, we estimated the value function. In TD control, we optimize the value function. For TD control, we use two kinds of control algorithm:
- Off-policy learning algorithm: Q learning
- On-policy learning algorithm: SARSA
In TD prediction, we estimated the value function. In TD control, we optimize the value function. For TD control, we use two kinds of control algorithm:
18.216.255.250