Feature matching

Once we have extracted features and their descriptors from two (or more) images, we can start asking whether some of these features show up in both (or all) images. For example, if we have descriptors for both our object of interest (self.desc_train) and the current video frame (desc_query), we can try to find regions of the current frame that look like our object of interest. This is done by the following method, which makes use of the Fast Library for Approximate Nearest Neighbors (FLANN):

good_matches = self._match_features(desc_query)

The process of finding frame-to-frame correspondences can be formulated as the search for the nearest neighbor from one set of descriptors for every element of another set.

The first set of descriptors is usually called the train set, because in machine learning, these descriptors are used to train some model, such as the model of the object that we want to detect. In our case, the train set corresponds to the descriptor of the template image (our object of interest). Hence, we call our template image the train image (self.img_train).

The second set is usually called the query set, because we continually ask whether it contains our train image. In our case, the query set corresponds to the descriptor of each incoming frame. Hence, we call a frame the query image (img_query).

Features can be matched in any number of ways, for example, with the help of a brute-force matcher (cv2.BFMatcher) that looks for each descriptor in the first set and the closest descriptor in the second set by trying each one (exhaustive search).

Matching features across images with FLANN

The alternative is to use an approximate k-nearest neighbor (kNN) algorithm to find correspondences, which is based on the fast third-party library FLANN. A FLANN match is performed with the following code snippet, where we use kNN with k=2:

def _match_features(self, desc_frame):
    matches = self.flann.knnMatch(self.desc_train, desc_frame, k=2)

The result of flann.knnMatch is a list of correspondences between two sets of descriptors, both contained in the matches variable. These are the train set, because it corresponds to the pattern image of our object of interest, and the query set, because it corresponds to the image in which we are searching for our object of interest.

The ratio test for outlier removal

The more the correct matches found (which means that more pattern-to-image correspondences exist), the more the chances that the pattern is present in the image. However, some matches might be false positives.

A well-known technique for removing outliers is called the ratio test. Since we performed kNN-matching with k=2, the two nearest descriptors are returned for each match. The first match is the closest neighbor and the second match is the second closest neighbor. Intuitively, a correct match will have a much closer first neighbor than its second closest neighbor. On the other hand, the two closest neighbors will be at a similar distance from an incorrect match. Therefore, we can find out how good a match is by looking at the difference between the distances. The ratio test says that the match is good only if the distance ratio between the first match and the second match is smaller than a given number (usually around 0.5); in our case, this number chosen to be 0.7. To remove all matches that do not satisfy this requirement, we filter the list of matches and store the good matches in the good_matches variable:

# discard bad matches, ratio test as per Lowe's paper
good_matches = filter(lambda x: x[0].distance<0.7*x[1].distance,matches)

Then we pass the matches we found to FeatureMatching.match so that they can be processed further:

return good_matches

Visualizing feature matches

In newer versions of OpenCV, we can easily draw matches using cv2.drawMatches or cv3.drawMatchesKnn.

In older versions of OpenCV, we may need to write our own function. The goal is to draw both the object of interest and the current video frame (in which we expect the object to be embedded) next to each other:

def draw_good_matches(img1, kp1, img2, kp2, matches):
    # Create a new output image that concatenates the
    # two images together (a.k.a) a montage
    rows1, cols1 = img1.shape[:2]
    rows2, cols2 = img2.shape[:2]
    out = np.zeros((max([rows1, rows2]), cols1+cols2, 3), dtype='uint8')

In order to draw colored lines on the image, we create a three-channel RGB image:

    # Place the first image to the left, copy 3x for RGB
    out[:rows1, :cols1, :] = np.dstack([img1, img1, img1])

    # Place the next image to the right of it, copy 3x for RGB
    out[:rows2, cols1:cols1 + cols2, :] = np.dstack([img2, img2,img2])

Then, for each pair of points between both images, we draw small blue circles, and we connect the two circles with a line. For this, we have to iterate over the list of matching keypoints. The keypoints are stored as tuples in Python, with two entries for the x and y coordinates. Each match, m, stores the index in the keypoint lists, where m.trainIdx points to the index in the first keypoint list (kp1) and m.queryIdx points to the index in the second keypoint list (kp2):

for m in matches:
    # Get the matching keypoints for each of the images
    c1, r1 = kp1[m.trainIdx].pt
    c2, r2 = kp2[m.queryIdx].pt

With the correct indices, we can now draw a circle at the correct location (with the radius as 4, the color as blue, and the thickness as 1) and connect the circles with a line:

    radius = 4
    BLUE = (255, 0, 0)
    thickness = 1
    # Draw a small circle at both co-ordinates
    cv2.circle(out, (int(c1), int(r1)), radius, BLUE, thickness)
    cv2.circle(out, (int(c2) + cols1, int(r2)), radius, BLUE, thickness

    # Draw a line in between the two points
    cv2.line(out, (int(c1), int(r1)), (int(c2) + cols1, int(r2)), BLUE, thickness)
    return out

Then, the returned image can be drawn with this code:

cv2.imshow('imgFlann', draw_good_matches(self.img_train, self.key_train, img_query, key_query, good_matches))

The blue lines connect the features in the object (left) to the features in the scenery (right), as shown here:

Visualizing feature matches

This works fine in a simple example such as this, but what happens when there are other objects in the scene? Since our object contains some lettering that seems highly salient, what happens when there are other words present?

As it turns out, the algorithm works even under such conditions, as you can see in this screenshot:

Visualizing feature matches

Interestingly, the algorithm did not confuse the name of the author as seen on the left with the black-on-white lettering next to the book in the scene, even though they spell out the same name. This is because the algorithm found a description of the object that does not rely purely on the grayscale representation. On the other hand, an algorithm doing a pixel-wise comparison could have easily gotten confused.

Homography estimation

Since we are assuming that the object of our interest is planar (an image) and rigid, we can find the homography transformation between the feature points of the two images. Homography will calculate the perspective transformation required to bring all feature points in the object image (self.key_train) into the same plane as all the feature points in the current image frame (self.key_query). But first, we need to find the image coordinates of all keypoints that are good matches:

def _detect_corner_points(self, key_frame, good_matches):
    src_points = [self.key_train[good_matches[i].trainIdx].pt
        for i in xrange(len(good_matches))]
    dst_points = [keyQuery[good_matches[i].queryIdx].pt
        for i in xrange(len(good_matches))]

To find the correct perspective transformation (a homography matrix H), the cv2.findHomography function will use the random sample consensus (RANSAC) method to probe different subsets of input points:

H, _ = cv2.findHomography(np.array(src_points), np.array(dst_points), cv2.RANSAC)

The homography matrix H can then help us transform any point in the pattern into the scenery, such as transforming a corner point in the training image to a corner point in the query image. In other words, this means that we can draw the outline of the book cover in the query image by transforming the corner points from the training image! For this, we take the list of corner points of the training image (src_corners) and see where they are projected in the query image by performing a perspective transform:

self.sh_train = self.img_train.shape[:2]  # rows, cols
src_corners = np.array([(0,0), (self.sh_train[1],0), (self.sh_train[1],self.sh_train[0]), (0,self.sh_train[0])], dtype=np.float32)
dst_corners = cv2.perspectiveTransform(src_corners[None, :, :], H)

The dst_corners return argument is a list of image points. All that we need to do is draw a line between each point in dst_corners and the very next one, and we will have an outline in the scenery. But first, in order to draw the line at the right image coordinates, we need to offset the x coordinate by the width of the pattern image (because we are showing the two images next to each other):

dst_corners = map(tuple,dst_corners[0])
dst_corners = [(np.int(dst_corners[i][0]+self.sh_train[1]),np.int(dst_corners[i][1]))

Then we can draw the lines from the ith point to the (i+1)-th point in the list (wrapping around to 0):

for i in xrange(0,len(dst_corners)):
    cv2.line(img_flann, dst_corners[i], dst_corners[(i+1) % 4],(0, 255, 0), 3)

Finally, we draw the outline of the book cover, like this:

Homography estimation

This works even when the object is only partially visible, as follows:

Homography estimation

Warping the image

We can also do the opposite—going from the probed scenery to the training pattern coordinates. This makes it possible for the book cover to be brought onto the frontal plane, as if we were looking at it directly from above. To achieve this, we can simply take the inverse of the homography matrix to get the inverse transformation:

Hinv = cv2.linalg.inverse(H)

However, this would map the top-left corner of the book cover to the origin of our new image, which would cut off everything to the left of and above the book cover. Instead, we want to roughly center the book cover in the image. Thus, we need to calculate a new homography matrix. As input, we will have our pts_scene scenery points. As output, we want an image that has the same shape as the pattern image:

dst_size = img_in.shape[:2]  # cols, rows

The book cover should be roughly half of that size. We can come up with a scaling factor and a bias term so that every keypoint in the scenery image is mapped to the correct coordinate in the new image:

scale_row = 1./src_size[0]*dst_size[0]/2.
bias_row = dst_size[0]/4.
scale_col = 1./src_size[1]*dst_size[1]/2.
bias_col = dst_size[1]/4.

Next, we just need to apply this linear scaling to every keypoint in the list. The easiest way to do this is with list comprehensions:

src_points = [key_frame[good_matches[i].trainIdx].pt
    for i in xrange(len(good_matches))]
dst_points = [self.key_train[good_matches[i].queryIdx].pt
    for i in xrange(len(good_matches))]
dst_points = [[x*scale_row+bias_row, y*scale_col+bias_col]
    for x, y in dst_points]

Then we can find the homography matrix between these points (make sure that the list is converted to a NumPy array):

Hinv, _ = cv2.findHomography(np.array(src_points), np.array(dst_points), cv2.RANSAC)

After that, we can use the homography matrix to transform every pixel in the image (this is also called warping the image):

img_warp = cv2.warpPerspective(img_query, Hinv, dst_size)

The result looks like this (matching on the left and warped image on the right):

Warping the image

The image resulting from the perspective transformation might not be perfectly aligned with the frontoparallel plane, because after all, the homography matrix is only approximate. In most cases, however, our approach works just fine, such as in the example shown in the following figure:

Warping the image
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.15.161