Detecting objects with the YOLO algorithm

In this section, we're going to see how the YOLO algorithm works. YOLO stands for you only look once. The name comes from the fact that you need only one execution of the neural network to get all predictions, which is possible because of the use of convolutional sliding windows.

YOLO solves the problem of the bounding box's accuracy. So as we saw in the previous section, we had this image:

With the help of the convolutional sliding window, we were able to detect all the window's predictions with one execution. So, for each of these windows, we can detect whether the selected pixels represent a car.

Now the problem is that even if we can do that, this window is kind of steady, making it incapable of representing a good bounding box. Observe the image carefully to notice that none of the cars is in a good bounding box.

Looking at this, almost 90% of the window contains non-car information, which isn't good. Here's where YOLO steps in. It lets you specify the bounding boxes freely, where each can cross through several windows:

Let's assume that the dimensions of the bounding boxes are specified relatively to the width and height of an image. Therefore, for the previous image, we can assume the height to be 0.2 or 20% of the height of the image and the width can be 0.1 or 10% of the width of the image. We can define the center of the bounding boxes and then define the boundaries as well. The top-left is (0, 0), the bottom-right is (1,1), and the coordinates of the centers of the bounding boxes will vary per the location of the box.

Once the neural network is fitted with labeled data, the neural network will give you back the same structure. It will predict the structure of the bounding boxes.

There's one problem: we need to connect the bounding boxes to the windows, because what the convolutional sliding window will give you in the end is just the windows. YOLO solves this problem by assigning the bounding boxes' height, width, and center to the window that holds the center of the bounding box.

For example, in such cases, only the windows depicted in the following screenshot will be responsible for detecting an object and having the bounding box specification:

Whichever window contains the center of the bounding box will be chosen.

The structure of the YOLO algorithm is similar to the structure of the convolutional window algorithm:

The first value is P_c, which is maybe 60% for the bounding box that lies at the left end of the screen, then we have the bounding box's properties, which are relative to the image, and then the class number. In a similar fashion, we can depict the structure for the other boxes.

There are certain windows that have a significant role, despite the fact that the P_c value is 0. This is because the YOLO algorithm believes that the windows that contain the center are the most important.

The YOLO algorithm uses this method for training and only the windows that contain the center will be optimized for the bounding boxes that have an object in it.

Table of Contents for Detecting objects with the YOLO algorithm

Create new playlist

Sign In

Sign Up

Table of Contents for
Detecting objects with the YOLO algorithm