So, even more fancy words: dynamic programming can be used to describe what we just did as well. Wow! That sounds like artificial intelligence, computers programming themselves, Terminator 2, Skynet stuff. But no, it's just what we just did. If you look up the definition of dynamic programming, it is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions ideally, using a memory-based data structure.
The next time the same subproblem occurs, instead of recomputing its solution, one simply looks up the previously computed solution thereby saving computation time at the expense of a (hopefully) modest expenditure in storage space:
- A method for solving a complex problem: Same as creating an intelligent Pac-Man, that's a pretty complicated end result.
- By breaking it down into a collection of simpler subproblems: So, for example, what is the optimal action to take for a given state that Pac-Man might be in. There are many different states Pac-Man could find himself in, but each one of those states represents a simpler subproblem, where there's a limited set of choices I can make, and there's one right answer for the best move to make.
- Storing their solutions: Those solutions being the Q values that I associated with each possible action at each state.
- Ideally, using a memory-based data structure: Well, of course I need to store those Q values and associate them with states somehow, right?
- The next time the same subproblem occurs: The next time Pac-Man is in a given state that I have a set of Q values for.
- Instead of recomputing its solution, one simply looks up the previously computed solution: The Q value I already have from the exploration stage.
- Thereby saving computation time at the expense of a modest expenditure in storage space: That's exactly what we just did with reinforcement learning.
We have a complicated exploration phase that finds the optimal rewards associated with each action for a given state. Once we have that table of the right action to take for a given state, we can very quickly use that to make our Pac-Man move in an optimal manner in a whole new maze that he hasn't seen before. So, reinforcement learning is also a form of dynamic programming. Wow!
To recap, you can make an intelligent Pac-Man agent by just having it semi-randomly explore different choices of movement given different conditions, where those choices are actions and those conditions are states. We keep track of the reward or penalty associated with each action or state as we go, and we can actually discount, going back multiple steps if you want to make it even better.
Then we store those Q values that we end up associating with each state, and we can use that to inform its future choices. So we can go into a whole new maze, and have a really smart Pac-Man that can avoid the ghosts and eat them up pretty effectively, all on its own. It's a pretty simple concept, very powerful though. You can also say that you understand a bunch of fancy terms because it's all called the same thing. Q-learning, reinforcement learning, Markov decision processes, dynamic programming: all tied up in the same concept.
I don't know, I think it's pretty cool that you can actually make sort of an artificially intelligent Pac-Man through such a simple technique, and it really does work! If you want to go look at it in more detail, following are a few examples you can look at that have one actual source code you can look at, and potentially play with, Python Markov Decision Process Toolbox: http://pymdptoolbox.readthedocs.org/en/latest/api/mdp.html.
There is a Python Markov decision process toolbox that wraps it up in all that terminology we talked about. There's an example you can look at, a working example of the cat and mouse game, which is similar. And, there is actually a Pac-Man example you can look at online as well, that ties in more directly with what we were talking about. Feel free to explore these links, and learn even more about it.
And so that's reinforcement learning. More generally, it's a useful technique for building an agent that can navigate its way through a possible different set of states that have a set of actions that can be associated with each state. So, we've talked about it mostly in the context of a maze game. But, you can think more broadly, and you know whenever you have a situation where you need to predict behavior of something given a set of current conditions and a set of actions it can take. Reinforcement learning and Q-learning might be a way of doing it. So, keep that in mind!