Chapter 3. Finding Objects via Feature Matching and Perspective Transforms

The goal of this chapter is to develop an app that is able to detect and track an object of interest in the video stream of a webcam, even if the object is viewed from different angles or distances or under partial occlusion.

In this chapter, we will cover the following topics:

  • Feature extraction
  • Feature matching
  • Feature tracking

In the previous chapter, you learned how to detect and track a simple object (the silhouette of a hand) in a very controlled environment. To be more specific, we instructed the user of our app to place the hand in the central region of the screen and made assumptions about the size and shape of the object (the hand). But what if we wanted to detect and track objects of arbitrary sizes, possibly viewed from a number of different angles or under partial occlusion?

For this, we will make use of feature descriptors, which are a way of capturing the important properties of our object of interest. We do this so that the object can be located even when it is embedded in a busy visual scene. We will again apply our algorithm to the live stream of a webcam, and do our best to keep the algorithm robust yet simple enough to run in real time.

Tasks performed by the app

The app will analyze each captured frame to perform the following tasks:

  • Feature extraction: We will describe an object of interest with Speeded-Up Robust Features (SURF), which is an algorithm used to find distinctive keypoints in an image that are both scale invariant and rotation invariant. These keypoints will help us make sure that we are tracking the right object over multiple frames. Because the appearance of the object might change from time to time, it is important to find keypoints that do not depend on the viewing distance or viewing angle of the object (hence the scale and rotation invariance).
  • Feature matching: We will try to establish a correspondence between keypoints using the Fast Library for Approximate Nearest Neighbors (FLANN) to see whether a frame contains keypoints similar to the keypoints from our object of interest. If we find a good match, we will mark the object in each frame.
  • Feature tracking: We will keep track of the located object of interest from frame to frame using various forms of early outlier detection and outlier rejection to speed up the algorithm.
  • Perspective transform: We will then reverse any translations and rotations that the object has undergone by warping the perspective so that the object appears upright in the center of the screen. This creates a cool effect in which the object seems frozen in a position while the entire surrounding scene rotates around it.

An example of the first three steps is shown in the following image, which contains a template image of our object of interest on the left, and me holding a printout of the template image on the right. Matching features in the two frames are connected with blue lines, and the located object is outlined in green on the right:

Tasks performed by the app

The last step is transforming the located object so that it is projected onto the frontal plane (which should look roughly like the original template image, appearing close-up and roughly upright), while the entire scene seems to warp around it, as shown in the following figure:

Tasks performed by the app

Note

Again, the GUI will be designed with wxPython 2.8, which can be obtained from http://www.wxpython.org/download.php. This chapter has been tested with OpenCV 2.4.9. Note that if you are using OpenCV 3, you may have to obtain the so-called extra modules from https://github.com/Itseez/opencv_contrib and install OpenCV 3 with the OPENCV_EXTRA_MODULES_PATH variable set in order to get SURF and FLANN installed. Also, note that you may have to obtain a license to use SURF in commercial applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.120.57