Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

SARSA Learning

SARSA(State Action Reward State Action) learning like Q learning is also a policy based reinforcement learning technique. Its goal is to learn an optimal policy which helps an agent decide on the action that needs to be taken under various possible circumstances.

SARSA and Q learning are very similar to each other except Q learning is an off-policy algorithm and SARSA is an on-policy algorithm. The Q value learned by SARSA is not based on greedy policy like in Q learning but is based on the action performed under current policy.

For a single state s and an action a, Q(s, a) can be expressed in terms of Q value of the next state s' and action a', given by

Below is the Pseudocode for the SARSA learning algorithm from the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Figure 15.14: Pseudocode for SARSA learning

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.119.235.79

Table of Contents for SARSA Learning

Create new playlist

Sign In

Sign Up

Table of Contents for
SARSA Learning