Overviewing reinforcement learning

As mentioned in Chapter 1, Machine Learning Basics, reinforcement learning is a case where the machine is trained for a specific outcome with the sole purpose of maximizing efficiency and/or performance. The algorithm is rewarded for making correct decisions and penalized for making incorrect ones, as shown in the following diagram:

Continual training is used to constantly improve performance. The focus here is on performance, meaning somehow finding a balance between unseen data and what the algorithms have already learned. The algorithm applies an action to its environment, receives a reward or a penalty based on what it has done, repeats the process, and so on.

We're going to dive right into the application in this chapter, and we're going to use the incredible Accord.NET open source machine learning framework to highlight how we can use reinforcement learning to help an autonomous object get from its starting location, depicted by a black object, to a desired end point, depicted by a red object.

The concept is similar, although on a much lower scale of complexity, to what autonomous vehicles do to get you from point A to point B. Our example will allow you to use maps of various complexity, meaning various obstacles may appear in between your autonomous object and the desired location. Let's look at our application:

Here, you can see that we have a very basic map loaded, one with no obstacles but only exterior confining walls. The black block (start) is our autonomous object and the red block (stop) is our destination. Our goal in this application is to navigate the walls to get to our desired location. If our next move puts us onto a white block, our algorithm will be rewarded. If our next move puts us into a wall, it will be penalized. From this, our autonomous object should be able to get to its destination. The question is: how fast can it learn? In this example, there are absolutely no obstacles in its path, so there should be no issues solving the problem in the shortest number of moves possible.

The following is another example of a somewhat more complicated map for our environment:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.163.62