Mean-shift tracking

It turns out that the salience detector discussed previously is already a great tracker of proto-objects by itself. One could simply apply the algorithm to every frame of a video sequence and get a good idea of the location of the objects. However, what is getting lost is correspondence information. Imagine a video sequence of a busy scene, such as from a city center or a sports stadium. Although a saliency map could highlight all the proto-objects in every frame of a recorded video, the algorithm would have no way to know which proto-objects from the previous frame are still visible in the current frame. Also, the proto-objects map might contain some false-positives, such as in the following example:

Mean-shift tracking

Note that the bounding boxes extracted from the proto-objects map made (at least) three mistakes in the preceding example: it missed highlighting a player (upper-left), merged two players into the same bounding box, and highlighted some additional arguably non-interesting (although visually salient) objects. In order to improve these results, we want to make use of a tracking algorithm.

To solve the correspondence problem, we could use the methods we have learned about previously, such as feature matching and optic flow. Or, we could use a different technique called mean-shift tracking.

Mean-shift is a simple yet very effective technique for tracking arbitrary objects. The intuition behind mean-shift is to consider the pixels in a small region of interest (say, a bounding box of an object we want to track) as sampled from an underlying probability density function that best describes a target.

Consider, for example, the following image:

Mean-shift tracking

Here, the small gray dots represent samples from a probability distribution. Assume that the closer the dots, the more similar they are to each other. Intuitively speaking, what mean-shift is trying to do is to find the densest region in this landscape and draw a circle around it. The algorithm might start out centering a circle over a region of the landscape that is not dense at all (dashed circle). Over time, it will slowly move towards the densest region (solid circle) and anchor on it. If we design the landscape to be more meaningful than dots (for example, by making the dots correspond to color histograms in the small neighborhoods of an image), we can use mean-shift tracking to find the objects of interest in the scene by finding the histogram that most closely matches the histogram of a target object.

Mean-shift has many applications (such as clustering, or finding the mode of probability density functions), but it is also particularly well-suited to target tracking. In OpenCV, the algorithm is implemented in cv2.meanShift, but it requires some pre-processing to function correctly. We can outline the procedure as follows:

  1. Fix a window around each data point: For example, a bounding box around an object or region of interest.
  2. Compute the mean of data within the window: In the context of tracking, this is usually implemented as a histogram of the pixel values in the region of interest. For best performance on color images, we will convert to HSV color space.
  3. Shift the window to the mean and repeat until convergence: This is handled transparently by cv2.meanShift. We can control the length and accuracy of the iterative method by specifying termination criteria.

Automatically tracking all players on a soccer field

For the remainder of this chapter, our goal is to combine the saliency detector with mean-shift tracking to automatically track all the players on a soccer field. The proto-objects identified by the salience detector will serve as input to the mean-shift tracker. Specifically, we will focus on a video sequence from the Alfheim dataset, which can be freely obtained from http://home.ifi.uio.no/paalh/dataset/alfheim/.

The reason for combining the two algorithms (saliency map and mean-shift tracking), is to remove false positives and improve the accuracy of the overall tracking. This will be achieved in a two-step procedure:

  1. Have both the saliency detector and mean-shift tracking assemble a list of bounding boxes for all the proto-objects in a frame. The saliency detector will operate on the current frame, whereas the mean-shift tracker will try to find the proto-objects from the previous frame in the current frame.
  2. Keep only those bounding boxes for which both algorithms agree on the location and size. This will get rid of outliers that have been mislabeled as proto-objects by one of the two algorithms.

The hard work is done by the previously introduced MultiObjectTracker class and its advance_frame method. This method relies on a few private worker methods, which will be explained in detail next. The advance_frame method is called whenever a new frame arrives, and accepts a proto-objects map as input:

def advance_frame(self, frame, proto_objects_map):
    self.tracker = copy.deepcopy(frame)

The method then builds a list of all the candidate bounding boxes, combining the bounding boxes both from the saliency map of the current frame as well as the mean-shift tracking results from the previous to the current frame:

# build a list of all bounding boxes
box_all = []

# append to the list all bounding boxes found from the
# current proto-objects map
box_all = self._append_boxes_from_saliency(proto_objects_map,box_all)

    # find all bounding boxes extrapolated from last frame
    # via mean-shift tracking
    box_all = self._append_boxes_from_meanshift(frame, box_all)

The method then attempts to merge the candidate bounding boxes in order to remove the duplicates. This can be achieved with cv2.groupRectangles, which will return a single bounding box if group_thresh+1 or more bounding boxes overlap in an image:

# only keep those that are both salient and in mean shift
if len(self.object_roi)==0:
    group_thresh = 0    # no previous frame: keep all form 
    # saliency
else:
    group_thresh = 1 # previous frame + saliency
box_grouped,_ = cv2.groupRectangles(box_all, group_thresh, 0.1)

In order to make mean-shift work, we will have to do some bookkeeping, which will be explained in detail in the following subsections:

# update mean-shift bookkeeping for remaining boxes
self._update_mean_shift_bookkeeping(frame, box_grouped)

Then we can draw the list of unique bounding boxes on the input image and return the image for plotting:

for (x, y, w, h) in box_grouped:
    cv2.rectangle(self.tracker, (x, y), (x + w, y + h),(0, 255, 0), 2)

return self.tracker

Extracting bounding boxes for proto-objects

The first private worker method is relatively straightforward. It takes a proto-objects map as input as well as a (previously aggregated) list of bounding boxes. To this list, it adds all the bounding boxes found from the contours of the proto-objects:

def _append_boxes_from_saliency(self, proto_objects_map, box_all):
    box_sal = []
    cnt_sal, _ = cv2.findContours(proto_objects_map, 1, 2)

However, it discards the bounding boxes that are smaller than some threshold, self.min_cnt_area, which is set in the constructor:

for cnt in cnt_sal:
    # discard small contours
    if cv2.contourArea(cnt) < self.min_cnt_area:
        continue

The result is appended to the box_all list and passed up for further processing:

    # otherwise add to list of boxes found from saliency map
    box = cv2.boundingRect(cnt)
    box_all.append(box)

return box_all

Setting up the necessary bookkeeping for mean-shift tracking

The second private worker method is concerned with setting up all the bookkeeping that is necessary to perform mean-shift tracking. The method accepts an input image and a list of bounding boxes for which to generate the bookkeeping information:

def _update_mean_shift_bookkeeping(self, frame, box_grouped):

Bookkeeping mainly consists of calculating a histogram of the HSV color values of each proto-object's bounding box. Thus the input RGB image is converted to HSV right away:

hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

Then, every bounding box in box_grouped is parsed. We need to store both the location and size of the bounding box (self.object_box), as well as a histogram of the HSV color values (self.object_roi):

self.object_roi = []
self.object_box = []

The location and size of the bounding box is extracted from the list, and the region of interest is cut out of the HSV image:

for box in box_grouped:
    (x, y, w, h) = box
    hsv_roi = hsv[y:y + h, x:x + w]

We then calculate a histogram of all the hue (H) values in the region of interest. We also ignore the dim or the weakly pronounced areas of the bounding box by using a mask, and normalize the histogram in the end:

mask = cv2.inRange(hsv_roi, np.array((0., 60., 32.)), np.array((180., 255., 255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

We then store this information in the corresponding private member variables, so that it will be available in the very next frame of the process loop, where we will aim to locate the region of interest using the mean-shift algorithm:

self.object_roi.append(roi_hist)
self.object_box.append(box)

Tracking objects with the mean-shift algorithm

Finally, the third private worker method tracks the proto-objects by using the bookkeeping information stored earlier from the previous frame. Similar to _append_boxes_from_meanshift, we build a list of all the bounding boxes aggregated from mean-shift and pass it up for further processing. The method accepts an input image and a previously aggregated list of bounding boxes:

def _append_boxes_from_meanshift(self, frame, box_all):
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

The method then parses all the previously stored proto-objects (from self.object_roi and self.object_box):

for i in xrange(len(self.object_roi)):
    roi_hist = copy.deepcopy(self.object_roi[i])
    box_old = copy.deepcopy(self.object_box[i])

In order to find the new, shifted location of a region of interest recorded in the previous image frame, we feed the back-projected region of interest to the mean-shift algorithm. Termination criteria (self.term_crit) will make sure to try a sufficient number of iterations (100) and look for mean-shifts of at least some number of pixels (1):

dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
ret, box_new = cv2.meanShift(dst, tuple(box_old), self.term_crit)

But, before we append the newly detected, shifted bounding box to the list, we want to make sure that we are actually tracking the objects that move. The objects that do not move are most likely false-positives, such as line markings or other visually salient patches that are irrelevant to the task at hand.

In order to discard the irrelevant tracking results, we compare the location of the bounding box from the previous frame (box_old) and the corresponding bounding box from the current frame (box_new):

(xo, yo, wo, ho) = box_old
(xn, yn, wn, hn) = box_new

If their centers of mass did not shift at least sqrt(self.min_shift2) pixels, we do not include the bounding box in the list:

co = [xo + wo/2, yo + ho/2]
cn = [xn + wn/2, yn + hn/2]
if (co[0] - cn[0])**2 + (co[1] - cn[1])**2 >= self.min_shift2:
    box_all.append(box_new)

The resulting list of bounding boxes is again passed up for further processing:

        return box_all
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.220.83