Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements:

A set of states the agent can actually be in.
A set of actions that can be performed by an agent, for moving from one state to another.
A transition probability (), which is the probability of moving from one state to another state by performing some action .
A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state by performing some action .
A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.