
This concludes the first of two chapters dedicated to reinforcement learning. In this chapter, we learned to balance exploration (learning) and exploitation (executing) by:

  • Managing and reducing the confidence interval across the arms
  • Applying the simple epsilon-greedy selection for exploring underplayed arms
  • Leveraging the concept of probability matching through Thompson sampling for context-free bandits
  • Using Upper Confidence Bounds to model the confidence interval as a function of the number of plays

The K-armed bandit problem is a viable solution for simple problems in which the interaction between the actor (player) and the environment (bandit) relies on a single state and immediate reward.

The next chapter introduces alternative approaches to multiarmed bandits for more complex, multi-state problems using value-actions and the Markovian decision process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.