SARSA Learning

SARSA(State Action Reward State Action) learning like Q learning is also a policy based reinforcement learning technique. Its goal is to learn an optimal policy which helps an agent decide on the action that needs to be taken under various possible circumstances.

SARSA and Q learning are very similar to each other except Q learning is an off-policy algorithm and SARSA is an on-policy algorithm. The Q value learned by SARSA is not based on greedy policy like in Q learning but is based on the action performed under current policy.

For a single state s and an action a, Q(s, a) can be expressed in terms of Q value of the next state s' and action a', given by 

Below is the Pseudocode for the SARSA learning algorithm from the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Figure 15.14: Pseudocode for SARSA learning

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.235.79