Monte Carlo methods

MC methods for estimating the value function and discovering excellent policies do not require the presence of a model of the environment. They are able to learn through the use of the agent's experience alone or from samples of state sequences, actions, and rewards obtained from the interactions between agent and environment. The experience can be acquired by the agent in line with the learning process or emulated by a previously populated dataset. The possibility of gaining experience during learning (online learning) is interesting because it allows obtaining excellent behavior even in the absence of a priori knowledge of the dynamics of the environment. Even learning through an already populated experience dataset can be interesting, because if combined with online learning, it makes automatic policy improvement induced by others' experiences possible.

To solve the reinforcement learning problems, MC methods estimate the value function on the basis of the total sum of rewards, obtained on average in the past episodes. This assumes that the experience is divided into episodes, and that all episodes are composed of a finite number of transitions. This is because in MC methods, only once an episode is completed takes place the estimate of the new values ​​and the modification of the policy. MC methods iteratively estimate policy and value function. In this case, however, each iteration cycle is equivalent to completing an episode—the new estimates of policy and value function occur episode by episode.

Usually the term MC is used for estimation methods, which operations involve random components; in this case, MC refers to reinforcement learning methods based on total reward averages. Unlike the DP methods that calculate the values ​​for each state, the MC methods calculate the values ​​for each state-action pair, because in the absence of a model, only state values ​​are not sufficient to decide which action is best performed in a certain state.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.205.166