Object detection as classification – Sliding window

Object detection is a different problem to localization as we can have a variable number of objects in the image. Consequently it becomes very tricky to handle variable number of outputs if we consider detection as just a simple regression problem like we did for localization. Therefore we consider detection as a classification problem instead.

One very common approach that has been in use for a long time is to do object detection using sliding windows. The idea is to slide a window of fixed size across the input image. What is inside the window at each location is then sent to a classifier that will tell us if the window contains an object of interest or not.

For this purpose, one can first train a CNN classifier with small closely cropped images - resized to the same size as the window - of objects we want to detect e.g. cars. At test time our fixed size window is moved in a sliding fashion across the whole image that we want to detect objects in. Our CNN then predicts for each window if it is an object (a car in this case) or not.

With only one size of sliding window we can only detect one size of object. So, to find larger or smaller objects we can also use larger and smaller windows at test time and resize the contents before sending it to the classifier. Alternatively you can resize the whole input image and use only one size sliding window that will also run across these resized images. Both methods will work but the idea is to produce what is called a ‘pyramid of scales’ so we can detect different size objects in an image.

The big downfall of this method is there can be huge number of windows from various scales passing through the CNN for prediction. This makes it very computationally expensive to use with CNNs as the classifier. Also for the most of these windows they will contain no objects anyway.

Many improvements have been made to overcome this problem. In the following sections we will go through various techniques and algorithms that have been created to tackle the problem and how newer ones have improved on what came before them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.163.13