The simple approach

One simple approach is to always choose the action for a given state with the highest Q value that I've computed so far, and if there's a tie, just choose at random. So, initially all of my Q values might be 0, and I'll just pick actions at random at first.

As I start to gain information about better Q values for given actions and given states, I'll start to use those as I go. But, that ends up being pretty inefficient, and I can actually miss a lot of paths that way if I just tie myself into this rigid algorithm of always choosing the best Q value that I've computed thus far.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.34.62