SARSA

SARSA (you can already guess where this one is, going by the name) works like this:

  1. The agent starts at state 1
  2. It then performs action 1 and gets reward 1
  3. Next, it moves on to state 2, performs action 2, and gets reward 2
  4. Then, the agent goes back and updates the value of action 1

As you can see, the difference in the two algorithms is in the way the future reward is found. Q-learning uses the highest action possible from state 2, while SARSA uses the value of the action that is actually taken.

Here is the mathematical intuition for SARSA:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.144.56