This concludes the first of two chapters dedicated to reinforcement learning. In this chapter, we learned to balance exploration (learning) and exploitation (executing) by:
The K-armed bandit problem is a viable solution for simple problems in which the interaction between the actor (player) and the environment (bandit) relies on a single state and immediate reward.
The next chapter introduces alternative approaches to multiarmed bandits for more complex, multi-state problems using value-actions and the Markovian decision process.
3.144.89.2