Using scene specific knowledge and constraints to optimize the detection result

Once your cascade classifier object model is trained, you can use it to detect instances of the same object class in new input images, which are supplied to the system. However, once you apply your object model, you will notice that there are still false positive detections and objects that are not found. This section will cover techniques to improve your detection results, by removing, for example, most of the false positive detections with scene-specific knowledge.

Using the parameters of the detection command to influence your detection result

If you apply an object model to a given input image, you must consider several things. Let's first take a look at the detection function and some of the parameters that can be used to filter out your detection output. OpenCV 3 supplies three possible interfaces. We will discuss the benefits of using each one of them.

Interface 1:

void CascadeClassifier::detectMultiScale(InputArray image, vector<Rect>& objects, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size())

The first interface is the most basic one. It allows you to fast evaluate your trained model on a given test image. There are several elements on this basic interface that will allow you to manipulate the detection output. We will discuss these parameters in some more detail and highlight some points of attention when selecting the correct value.

scaleFactor is the scale step used to downscale the original image in order to create the image pyramid, which allows us to perform multiscale detections using only a single scale model. One downside is that this doesn't allow you to detect objects that are smaller than the object size. Using a value of 1.1 means that in each step the dimensions are reduced by 10% compared to the previous step.

  • Increasing this value will make your detector run faster since it has less scale levels to evaluate, but it will yield the risk of losing detections that are in between scale steps.
  • Decreasing the value will make your detector run slower, since more scale levels need to be evaluated, but it will increase the chance of detecting objects that were missed before. Also, it will yield more detections on an actual object, resulting in a higher certainty.
  • Keep in mind that adding scale levels also gives rise to more false positive detections, since those are bound to each layer of the image pyramid.

A second interesting parameter to adapt for your needs is the minNeighbors parameter. It describes how many overlapping detections occur due to the sliding window approach. Each detection overlapping by more than 50% with another will be merged together as a sort of nonmaxima suppression.

  • Putting this value on 0 means that you will get all detections generated by the windows that get through the complete cascade. However, due to the sliding window approach (with steps of 8 pixels) many detections will happen for a single window, due to the nature of cascade classifiers, which train in some variance on object parameters in order to better generalize over an object class.
  • Adding a value means that you want to count how many windows there should be, at least those combined by the nonmaxima suppression in order to keep the detection. This is interesting since an actual object should yield far more detections than a false positive. So, increasing this value will reduce the number of false positive detections (which have a low amount of overlapping detections) and keep the true detections (which have a large amount of overlapping detections).
  • A downside is that on a certain point, actual objects with a lower certainty of detections and thus less overlapping windows will disappear while some false positive detections might still stand.

Use the minSize and maxSize parameters to effectively reduce the scale space pyramid. In an industrial setup with, for example, a fixed camera position, such as a conveyor belt setup, you can in most cases guarantee that objects will have certain dimensions. Adding scale values in this case and thus defining a scale range will decrease processing time for a single image a lot, by removing undesired scale levels. As an extra advantage, all false positive detections on those undesired scales will also disappear. If you leave these values blank, the algorithm will start building the image pyramid at input image dimensions, in a bottom-up manner, downscale in steps equaling the scale percentage, until one of the dimensions is smaller than the largest object dimension. This will be the top of the image pyramid, which is also the place where later, at the detection time, the detection algorithm will start running its object detector.

Interface 2:

void CascadeClassifier::detectMultiScale(InputArray image, vector<Rect>& objects, vector<int>& numDetections, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size())

The second interface brings a small addition, by adding the numDetections parameter. This allows you to put the minNeighbors value on 1, applying the merging of overlapping windows as nonmaxima suppression, but at the same time returning you a value of the overlapping windows, which were merged. This value can be seen as a certainty score of your detection. The higher the value, the better or the more certain the detection.

Interface 3:

void CascadeClassifier::detectMultiScale(InputArray image, std::vector<Rect>& objects, std::vector<int>& rejectLevels, std::vector<double>& levelWeights, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size(), bool outputRejectLevels=false )

A downside of this interface is that 100 windows with a very small certainty of detection on an individual basis can simply out rule a single detection with a very high individual certainty of detection. This is where the third interface can bring us the solution. It allows us to look at the individual scores of each detection window (described by the threshold value of the last stage of the classifier). You can then grab all those values and threshold the certainty score of those individual windows. When applying nonmaxima suppression in this case, the threshold values of all overlapping windows are combined.

Tip

Keep in mind that if you want to try out the third interface in OpenCV 3.0, you have to put the parameter outputRejectLevels on true. If you do not do this, then the level weights matrix, which has the threshold scores, will not be filled.

Note

Software illustrating the two most used interfaces for object detection can be found at https://github.com/OpenCVBlueprints/OpenCVBlueprints/tree/master/chapter_5/source_code/detect_simple and https://github.com/OpenCVBlueprints/OpenCVBlueprints/tree/master/chapter_5/source_code/detect_score. OpenCV detection interfaces change frequently and that it is possible that new interfaces are already available which are not discussed here.

Increasing object instance detection and reducing false positive detections

Once you have chosen the most appropriate way of retrieving the object detections for your application, you can evaluate the proper output of your algorithm. Two of the most common problems found after training an object detector are:

  • Object instances that are not being detected.
  • Too much false positive detections.

The reason for the first problem can be explained by looking at the generic object model that we trained for the object class based on positive training samples of that object class. This lets us conclude that the training either:

  • Did not contain enough positive training samples, making it impossible to generalize well over new object samples. In this case, it is important to add those false negative detections as positive samples to the training set and retrain your model with the extra data. This principle is called "reinforced learning".
  • We overtrained our model to the training set, again reducing the generalization of the model. To avoid this, reduce the model in stages and thus in complexity.

The second problem is quite normal and happens more than often. It is impossible to supply enough negative samples and at the same time ensure that there will not be a single negative window that could still yield a positive detection at a first run. This is mainly due to the fact that it is very hard for us humans to understand how the computer sees an object based on features. On the other hand, it is impossible to grasp every possible scenario (lighting conditions, interactions during the production process, filth on the camera, and so on) at the very start when training an object detector. You should see the creation of a good and stable model as an iterative process.

Note

An approach to avoid the influence of lighting conditions can be to triplicate the training set by generating artificial dark and artificial bright images for each sample. However, keep in mind the disadvantages of artificial data as discussed in the beginning of this chapter.

In order to reduce the amount of false positive detections, we generally need to add more negative samples. However, it is important not to add randomly generated negative windows, since the extra knowledge that they would bring to the model would, in most cases, simply be minimal. It is better to add meaningful negative windows that can increase the quality of the detector. This is known as hard negative mining using a bootstrapping process. The principle is rather simple:

  1. Start by training a first object model based on your initial training set of positive and negative window samples.
  2. Now, collect a set of negative images, which are either specific to your application (if you want to train an object detector specific to your setup) or which are more general (if you want your object detector to work in versatile conditions).
  3. Run your detector on that set of negative images, with a low certainty threshold and save all found detections. Cut them out of the supplied negative images and rescale them towards the object model size dimensions.
  4. Now, retrain your object model, but add all the found windows to your negative training set in order to ensure that your model will now be trained with this extra knowledge.

This will ensure that the accuracy of your model goes up by a fair and decent amount depending on the quality of your negative images.

Tip

When adding the found extra and useful negative samples, add them to the top of your background.txt file! This forces the OpenCV training interface to first grab these more important negative samples before sampling all the standard negative training images provided! Be sure that they have exactly the required model size so that they can only be used once as a negative training sample.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.204.166