Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Markov Decision Processes

This world that we've framed up happens to be a Markov Decision Process (MDP), which has the following properties:

It has a finite set of states, S
It has a finite set of actions, A
is the probability that taking action A will transition between state s and state
is the immediate reward for transition between s and
is the discount factor, which is how much we discount future rewards over present rewards (more on this later)

Once we have a policy function that determines which action to take for each state, the MDP has been solved and becomes a Markov chain.

And good news, it's totally possible to solve an MDP perfectly, with one caveat. That caveat is that all the rewards and probabilities for the MDP have to be known. It turns out this caveat is rather important because most of the time an agent can't know all the rewards and state change probabilities because the agent's environment is chaotic, or at least non-deterministic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.222.114.28

Table of Contents for Markov Decision Processes

Create new playlist

Sign In

Sign Up

Table of Contents for
Markov Decision Processes