Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

This concludes the first of two chapters dedicated to reinforcement learning. In this chapter, we learned to balance exploration (learning) and exploitation (executing) by:

Managing and reducing the confidence interval across the arms
Applying the simple epsilon-greedy selection for exploring underplayed arms
Leveraging the concept of probability matching through Thompson sampling for context-free bandits
Using Upper Confidence Bounds to model the confidence interval as a function of the number of plays

The K-armed bandit problem is a viable solution for simple problems in which the interaction between the actor (player) and the environment (bandit) relies on a single state and immediate reward.

The next chapter introduces alternative approaches to multiarmed bandits for more complex, multi-state problems using value-actions and the Markovian decision process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Summary

Table of Contents for
Summary