Background subtractors – KNN, MOG2, and GMG

OpenCV provides a class called BackgroundSubtractor, which is a handy way to operate foreground and background segmentation.

This works similarly to the GrabCut algorithm we analyzed in Chapter 3, Processing Images with OpenCV 3, however, BackgroundSubtractor is a fully fledged class with a plethora of methods that not only perform background subtraction, but also improve background detection in time through machine learning and lets you save the classifier to a file.

To familiarize ourselves with BackgroundSubtractor, let's look at a basic example:

import numpy as np
import cv2

cap = cv2.VideoCapture')

mog = cv2.createBackgroundSubtractorMOG2()

while(1):
    ret, frame = cap.read()
    fgmask = mog.apply(frame)
    cv2.imshow('frame',fgmask)
    if cv2.waitKey(30) & 0xff:
        break

cap.release()
cv2.destroyAllWindows()

Let's go through this in order. First of all, let's talk about the background subtractor object. There are three background subtractors available in OpenCV 3: K-Nearest Neighbors (KNN), Mixture of Gaussians (MOG2), and Geometric Multigrid (GMG), corresponding to the algorithm used to compute the background subtraction.

You may remember that we already elaborated on the topic of foreground and background detection in Chapter 5, Depth Estimation and Segmentation, in particular when we talked about GrabCut and Watershed.

So why do we need the BackgroundSubtractor classes? The main reason behind this is that BackgroundSubtractor classes are specifically built with video analysis in mind, which means that the OpenCV BackgroundSubtractor classes "learn" something about the environment with every frame. For example, with GMG, you can specify the number of frames used to initialize the video analysis, with the default being 120 (roughly 5 seconds with average cameras). The constant aspect about the BackgroundSubtractor classes is that they operate a comparison between frames and they store a history, which allows them to improve motion analysis results as time passes.

Another fundamental (and frankly, quite amazing) feature of the BackgroundSubtractor classes is the ability to compute shadows. This is absolutely vital for an accurate reading of video frames; by detecting shadows, you can exclude shadow areas (by thresholding them) from the objects you detected, and concentrate on the real features. It also greatly reduces the unwanted "merging" of objects. An image comparison will give you a good idea of the concept I'm trying to illustrate. Here's a sample of background subtraction without shadow detection:

Background subtractors – KNN, MOG2, and GMG

Here's an example of shadow detection (with shadows thresholded):

Background subtractors – KNN, MOG2, and GMG

Note that shadow detection isn't absolutely perfect, but it helps bring the object contours back to the object's original shape. Let's take a look at a reimplemented example of motion detection utilizing BackgroundSubtractorKNN:

import cv2
import numpy as np

bs = cv2.createBackgroundSubtractorKNN(detectShadows = True)
camera = cv2.VideoCapture("/path/to/movie.flv")

while True:
  ret, frame = camera.read()
  fgmask = bs.apply(frame)
  th = cv2.threshold(fgmask.copy(), 244, 255, cv2.THRESH_BINARY)[1]
  dilated = cv2.dilate(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)), iterations = 2)
  image, contours, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
  for c in contours:
    if cv2.contourArea(c) > 1600:
      (x,y,w,h) = cv2.boundingRect(c)
      cv2.rectangle(frame, (x,y), (x+w, y+h), (255, 255, 0), 2)

  cv2.imshow("mog", fgmask)
  cv2.imshow("thresh", th)
  cv2.imshow("detection", frame)
  if cv2.waitKey(30) & 0xff == 27:
      break

camera.release()
cv2.destroyAllWindows()

As a result of the accuracy of the subtractor, and its ability to detect shadows, we obtain a really precise motion detection, in which even objects that are next to each other don't get merged into one detection, as shown in the following screenshot:

Background subtractors – KNN, MOG2, and GMG

That's a remarkable result for fewer than 30 lines of code!

The core of the entire program is the apply() method of the background subtractor; it computes a foreground mask, which can be used as a basis for the rest of the processing:

fgmask = bs.apply(frame)
th = cv2.threshold(fgmask.copy(), 244, 255, cv2.THRESH_BINARY)[1]
dilated = cv2.dilate(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)), iterations = 2)
image, contours, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for c in contours:
    if cv2.contourArea(c) > 1600:
        (x,y,w,h) = cv2.boundingRect(c)
        cv2.rectangle(frame, (x,y), (x+w, y+h), (255, 255, 0), 2)

Once a foreground mask is obtained, we can apply a threshold: the foreground mask has white values for the foreground and gray for shadows; thus, in the thresholded image, all pixels that are not almost pure white (244-255) are binarized to 0 instead of 1.

From there, we proceed with the same approach we adopted for the basic motion detection example: identifying objects, detecting contours, and drawing them on the original frame.

Meanshift and CAMShift

Background subtraction is a really effective technique, but not the only one available to track objects in a video. Meanshift is an algorithm that tracks objects by finding the maximum density of a discrete sample of a probability function (in our case, a region of interest in an image) and recalculating it at the next frame, which gives the algorithm an indication of the direction in which the object has moved.

This calculation gets repeated until the centroid matches the original one, or remains unaltered even after consecutive iterations of the calculation. This final matching is called convergence. For reference, the algorithm was first described in the paper, The estimation of the gradient of a density function, with applications in pattern recognition, Fukunaga K. and Hoestetler L., IEEE, 1975, which is available at http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330 (note that this paper is not free for download).

Here's a visual representation of this process:

Meanshift and CAMShift

Aside from the theory, Meanshift is very useful when tracking a particular region of interest in a video, and this has a series of implications; for example, if you don't know a priori what the region you want to track is, you're going to have to manage this cleverly and develop programs that dynamically start tracking (and cease tracking) certain areas of the video, depending on arbitrary criteria. One example could be that you operate object detection with a trained SVM, and then start using Meanshift to track a detected object.

Let's not make our life complicated from the very beginning, though; let's first get familiar with Meanshift, and then use it in more complex scenarios.

We will start by simply marking a region of interest and keeping track of it, like so:

import numpy as np
import cv2

cap = cv2.VideoCapture(0)
ret,frame = cap.read()
r,h,c,w = 10, 200, 10, 200
track_window = (c,r,w,h)

roi = frame[r:r+h, c:c+w]
hsv_roi =  cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_roi, np.array((100., 30.,32.)), np.array((180.,120.,255.)))

roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)

term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )

while True:
    ret ,frame = cap.read()

    if ret == True:
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)

        # apply meanshift to get the new location
        ret, track_window = cv2.meanShift(dst, track_window, term_crit)

        # Draw it on image
        x,y,w,h = track_window
        img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2)
        cv2.imshow('img2',img2)

        k = cv2.waitKey(60) & 0xff
        if k == 27:
            break

    else:
        break

cv2.destroyAllWindows()
cap.release()

In the preceding code, I supplied the HSV values for tracking some shades of lilac, and here's the result:

Meanshift and CAMShift

If you ran the code on your machine, you'd notice how the Meanshift window actually looks for the specified color range; if it doesn't find it, you'll just see the window wobbling (it actually looks a bit impatient). If an object with the specified color range enters the window, the window will then start tracking it.

Let's examine the code so that we can fully understand how Meanshift performs this tracking operation.

Color histograms

Before showing the code for the preceding example, though, here is a not-so-brief digression on color histograms and the two very important built-in functions of OpenCV: calcHist and calcBackProject.

The function, calcHist, calculates color histograms of an image, so the next logical step is to explain the concept of color histograms. A color histogram is a representation of the color distribution of an image. On the x axis of the representation, we have color values, and on the y axis, we have the number of pixels corresponding to the color values.

Let's look at a visual representation of this concept, hoping the adage, "a picture speaks a thousand words", will apply in this instance too:

Color histograms

The picture shows a representation of a color histogram with one column per value from 0 to 180 (note that OpenCV uses H values 0-180. Other systems may use 0-360 or 0-255).

Aside from Meanshift, color histograms are used for a number of different and useful image and video processing operations.

The calcHist function

The calcHist() function in OpenCV has the following Python signature:

calcHist(...)
    calcHist(images, channels, mask, histSize, ranges[, hist[, accumulate]]) -> hist

The description of the parameters (as taken from the official OpenCV documentation) are as follows:

Parameter

Description

images

This parameter is the source arrays. They all should have the same depth, CV_8U or CV_32F , and the same size. Each of them can have an arbitrary number of channels.

channels

This parameter is the list of the dims channels used to compute the histogram.

mask

This parameter is the optional mask. If the matrix is not empty, it must be an 8-bit array of the same size as images[i]. The nonzero mask elements mark the array elements counted in the histogram.

histSize

This parameter is the array of histogram sizes in each dimension.

ranges

This parameter is the array of the dims arrays of the histogram bin boundaries in each dimension.

hist

This parameter is the output histogram, which is a dense or sparse dims (dimensional) array.

accumulate

This parameter is the accumulation flag. If it is set, the histogram is not cleared in the beginning when it is allocated. This feature enables you to compute a single histogram from several sets of arrays, or to update the histogram in time.

In our example, we calculate the histograms of the region of interest like so:

roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])

This can be interpreted as the calculation of color histograms for an array of images containing only the region of interest in the HSV space. In this region, we compute only the image values corresponding to the mask values not equal to 0, with 18 histogram columns, and with each histogram having 0 as the lower boundary and 180 as the upper boundary.

This is rather convoluted to describe but, once you have familiarized yourself with the concept of a histogram, the pieces of the puzzle should click into place.

The calcBackProject function

The other function that covers a vital role in the Meanshift algorithm (but not only this) is calcBackProject, which is short for histogram back projection (calculation). A histogram back projection is so called because it takes a histogram and projects it back onto an image, with the result being the probability that each pixel will belong to the image that generated the histogram in the first place. Therefore, calcBackProject gives a probability estimation that a certain image is equal or similar to a model image (from which the original histogram was generated).

Again, if you thought calcHist was a bit convoluted, calcBackProject is probably even more complex!

In summary

The calcHist function extracts a color histogram from an image, giving a statistical representation of the colors in an image, and calcBackProject helps in calculating the probability of each pixel of an image belonging to the original image.

Back to the code

Let's get back to our example. First our usual imports, and then we mark the initial region of interest:

cap = cv2.VideoCapture(0)
ret,frame = cap.read()
r,h,c,w = 10, 200, 10, 200
track_window = (c,r,w,h)

Then, we extract and convert the ROI to HSV color space:

roi = frame[r:r+h, c:c+w]
hsv_roi =  cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

Now, we create a mask to include all pixels of the ROI with HSV values between the lower and upper bounds:

mask = cv2.inRange(hsv_roi, np.array((100., 30.,32.)), np.array((180.,120.,255.)))

Next, we calculate the histograms of the ROI:

roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)

After the histograms are calculated, the values are normalized to be included within the range 0-255.

Meanshift performs a number of iterations before reaching convergence; however, this convergence is not assured. So, OpenCV allows us to pass so-called termination criteria, which is a way to specify the behavior of Meanshift with regard to terminating the series of calculations:

term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )

In this particular case, we're specifying a behavior that instructs Meanshift to stop calculating the centroid shift after ten iterations or if the centroid has moved at least 1 pixel. That first flag (EPS or CRITERIA_COUNT) indicates we're going to use either of the two criteria (count or "epsilon", meaning the minimum movement).

Now that we have a histogram calculated, and termination criteria for Meanshift, we can start our usual infinite loop, grab the current frame from the camera, and start processing it. The first thing we do is switch to HSV color space:

if ret == True:
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

Now that we have an HSV array, we can operate the long awaited histogram back projection:

        dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)

The result of calcBackProject is a matrix. If you printed it to console, it looks more or less like this:

[[  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]
 ..., 
 [  0   0  20 ...,   0   0   0]
 [ 78  20   0 ...,   0   0   0]
 [255 137  20 ...,   0   0   0]]

Each pixel is represented with its probability.

This matrix can the finally be passed into Meanshift, together with the track window and the termination criteria as outlined by the Python signature of cv2.meanShift:

meanShift(...)
    meanShift(probImage, window, criteria) -> retval, window

So here it is:

ret, track_window = cv2.meanShift(dst, track_window, term_crit)

Finally, we calculate the new coordinates of the window, draw a rectangle to display it in the frame, and then show it:

x,y,w,h = track_window
img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2)
cv2.imshow('img2',img2)

That's it. You should by now have a good idea of color histograms, back projections, and Meanshift. However, there remains one issue to be resolved with the preceding program: the size of the window does not change with the size of the object in the frames being tracked.

One of the authorities in computer vision and author of the seminal book, Learning OpenCV, Gary Bradski, O'Reilly, published a paper in 1988 to improve the accuracy of Meanshift, and described a new algorithm called Continuously Adaptive Meanshift (CAMShift), which is very similar to Meanshift but also adapts the size of the track window when Meanshift reaches convergence.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.173.53