OpenCV provides a class called BackgroundSubtractor
, which is a handy way to operate foreground and background segmentation.
This works similarly to the GrabCut algorithm we analyzed in Chapter 3, Processing Images with OpenCV 3, however, BackgroundSubtractor
is a fully fledged class with a plethora of methods that not only perform background subtraction, but also improve background detection in time through machine learning and lets you save the classifier to a file.
To familiarize ourselves with BackgroundSubtractor
, let's look at a basic example:
import numpy as np import cv2 cap = cv2.VideoCapture') mog = cv2.createBackgroundSubtractorMOG2() while(1): ret, frame = cap.read() fgmask = mog.apply(frame) cv2.imshow('frame',fgmask) if cv2.waitKey(30) & 0xff: break cap.release() cv2.destroyAllWindows()
Let's go through this in order. First of all, let's talk about the background subtractor object. There are three background subtractors available in OpenCV 3: K-Nearest Neighbors (KNN), Mixture of Gaussians (MOG2), and Geometric Multigrid (GMG), corresponding to the algorithm used to compute the background subtraction.
You may remember that we already elaborated on the topic of foreground and background detection in Chapter 5, Depth Estimation and Segmentation, in particular when we talked about GrabCut and Watershed.
So why do we need the BackgroundSubtractor
classes? The main reason behind this is that BackgroundSubtractor
classes are specifically built with video analysis in mind, which means that the OpenCV BackgroundSubtractor
classes "learn" something about the environment with every frame. For example, with GMG, you can specify the number of frames used to initialize the video analysis, with the default being 120 (roughly 5 seconds with average cameras). The constant aspect about the BackgroundSubtractor
classes is that they operate a comparison between frames and they store a history, which allows them to improve motion analysis results as time passes.
Another fundamental (and frankly, quite amazing) feature of the BackgroundSubtractor
classes is the ability to compute shadows. This is absolutely vital for an accurate reading of video frames; by detecting shadows, you can exclude shadow areas (by thresholding them) from the objects you detected, and concentrate on the real features. It also greatly reduces the unwanted "merging" of objects. An image comparison will give you a good idea of the concept I'm trying to illustrate. Here's a sample of background subtraction without shadow detection:
Here's an example of shadow detection (with shadows thresholded):
Note that shadow detection isn't absolutely perfect, but it helps bring the object contours back to the object's original shape. Let's take a look at a reimplemented example of motion detection utilizing BackgroundSubtractorKNN
:
import cv2 import numpy as np bs = cv2.createBackgroundSubtractorKNN(detectShadows = True) camera = cv2.VideoCapture("/path/to/movie.flv") while True: ret, frame = camera.read() fgmask = bs.apply(frame) th = cv2.threshold(fgmask.copy(), 244, 255, cv2.THRESH_BINARY)[1] dilated = cv2.dilate(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)), iterations = 2) image, contours, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) for c in contours: if cv2.contourArea(c) > 1600: (x,y,w,h) = cv2.boundingRect(c) cv2.rectangle(frame, (x,y), (x+w, y+h), (255, 255, 0), 2) cv2.imshow("mog", fgmask) cv2.imshow("thresh", th) cv2.imshow("detection", frame) if cv2.waitKey(30) & 0xff == 27: break camera.release() cv2.destroyAllWindows()
As a result of the accuracy of the subtractor, and its ability to detect shadows, we obtain a really precise motion detection, in which even objects that are next to each other don't get merged into one detection, as shown in the following screenshot:
That's a remarkable result for fewer than 30 lines of code!
The core of the entire program is the apply()
method of the background subtractor; it computes a foreground mask, which can be used as a basis for the rest of the processing:
fgmask = bs.apply(frame) th = cv2.threshold(fgmask.copy(), 244, 255, cv2.THRESH_BINARY)[1] dilated = cv2.dilate(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)), iterations = 2) image, contours, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) for c in contours: if cv2.contourArea(c) > 1600: (x,y,w,h) = cv2.boundingRect(c) cv2.rectangle(frame, (x,y), (x+w, y+h), (255, 255, 0), 2)
Once a foreground mask is obtained, we can apply a threshold: the foreground mask has white values for the foreground and gray for shadows; thus, in the thresholded image, all pixels that are not almost pure white (244-255) are binarized to 0 instead of 1.
From there, we proceed with the same approach we adopted for the basic motion detection example: identifying objects, detecting contours, and drawing them on the original frame.
Background subtraction is a really effective technique, but not the only one available to track objects in a video. Meanshift is an algorithm that tracks objects by finding the maximum density of a discrete sample of a probability function (in our case, a region of interest in an image) and recalculating it at the next frame, which gives the algorithm an indication of the direction in which the object has moved.
This calculation gets repeated until the centroid matches the original one, or remains unaltered even after consecutive iterations of the calculation. This final matching is called convergence. For reference, the algorithm was first described in the paper, The estimation of the gradient of a density function, with applications in pattern recognition, Fukunaga K. and Hoestetler L., IEEE, 1975, which is available at http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330 (note that this paper is not free for download).
Here's a visual representation of this process:
Aside from the theory, Meanshift is very useful when tracking a particular region of interest in a video, and this has a series of implications; for example, if you don't know a priori what the region you want to track is, you're going to have to manage this cleverly and develop programs that dynamically start tracking (and cease tracking) certain areas of the video, depending on arbitrary criteria. One example could be that you operate object detection with a trained SVM, and then start using Meanshift to track a detected object.
Let's not make our life complicated from the very beginning, though; let's first get familiar with Meanshift, and then use it in more complex scenarios.
We will start by simply marking a region of interest and keeping track of it, like so:
import numpy as np import cv2 cap = cv2.VideoCapture(0) ret,frame = cap.read() r,h,c,w = 10, 200, 10, 200 track_window = (c,r,w,h) roi = frame[r:r+h, c:c+w] hsv_roi = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv_roi, np.array((100., 30.,32.)), np.array((180.,120.,255.))) roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180]) cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX) term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) while True: ret ,frame = cap.read() if ret == True: hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1) # apply meanshift to get the new location ret, track_window = cv2.meanShift(dst, track_window, term_crit) # Draw it on image x,y,w,h = track_window img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2) cv2.imshow('img2',img2) k = cv2.waitKey(60) & 0xff if k == 27: break else: break cv2.destroyAllWindows() cap.release()
In the preceding code, I supplied the HSV values for tracking some shades of lilac, and here's the result:
If you ran the code on your machine, you'd notice how the Meanshift window actually looks for the specified color range; if it doesn't find it, you'll just see the window wobbling (it actually looks a bit impatient). If an object with the specified color range enters the window, the window will then start tracking it.
Let's examine the code so that we can fully understand how Meanshift performs this tracking operation.
Before showing the code for the preceding example, though, here is a not-so-brief digression on color histograms and the two very important built-in functions of OpenCV: calcHist
and calcBackProject
.
The function, calcHist
, calculates color histograms of an image, so the next logical step is to explain the concept of color histograms. A color histogram is a representation of the color distribution of an image. On the x axis of the representation, we have color values, and on the y axis, we have the number of pixels corresponding to the color values.
Let's look at a visual representation of this concept, hoping the adage, "a picture speaks a thousand words", will apply in this instance too:
The picture shows a representation of a color histogram with one column per value from 0 to 180 (note that OpenCV uses H values 0-180. Other systems may use 0-360 or 0-255).
Aside from Meanshift, color histograms are used for a number of different and useful image and video processing operations.
The calcHist()
function in OpenCV has the following Python signature:
calcHist(...) calcHist(images, channels, mask, histSize, ranges[, hist[, accumulate]]) -> hist
The description of the parameters (as taken from the official OpenCV documentation) are as follows:
Parameter |
Description |
---|---|
|
This parameter is the source arrays. They all should have the same depth, |
|
This parameter is the list of the |
|
This parameter is the optional mask. If the matrix is not empty, it must be an 8-bit array of the same size as |
|
This parameter is the array of histogram sizes in each dimension. |
|
This parameter is the array of the |
|
This parameter is the output histogram, which is a dense or sparse |
|
This parameter is the accumulation flag. If it is set, the histogram is not cleared in the beginning when it is allocated. This feature enables you to compute a single histogram from several sets of arrays, or to update the histogram in time. |
In our example, we calculate the histograms of the region of interest like so:
roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
This can be interpreted as the calculation of color histograms for an array of images containing only the region of interest in the HSV space. In this region, we compute only the image values corresponding to the mask values not equal to 0, with 18
histogram columns, and with each histogram having 0
as the lower boundary and 180
as the upper boundary.
This is rather convoluted to describe but, once you have familiarized yourself with the concept of a histogram, the pieces of the puzzle should click into place.
The other function that covers a vital role in the Meanshift algorithm (but not only this) is calcBackProject
, which is short for histogram back
projection (calculation). A histogram back projection is so called because it takes a histogram and projects it back onto an image, with the result being the probability that each pixel will belong to the image that generated the histogram in the first place. Therefore, calcBackProject
gives a probability estimation that a certain image is equal or similar to a model image (from which the original histogram was generated).
Again, if you thought calcHist
was a bit convoluted, calcBackProject
is probably even more complex!
Let's get back to our example. First our usual imports, and then we mark the initial region of interest:
cap = cv2.VideoCapture(0) ret,frame = cap.read() r,h,c,w = 10, 200, 10, 200 track_window = (c,r,w,h)
Then, we extract and convert the ROI to HSV color space:
roi = frame[r:r+h, c:c+w] hsv_roi = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
Now, we create a mask to include all pixels of the ROI with HSV values between the lower and upper bounds:
mask = cv2.inRange(hsv_roi, np.array((100., 30.,32.)), np.array((180.,120.,255.)))
Next, we calculate the histograms of the ROI:
roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180]) cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
After the histograms are calculated, the values are normalized to be included within the range 0-255.
Meanshift performs a number of iterations before reaching convergence; however, this convergence is not assured. So, OpenCV allows us to pass so-called termination criteria, which is a way to specify the behavior of Meanshift with regard to terminating the series of calculations:
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
In this particular case, we're specifying a behavior that instructs Meanshift to stop calculating the centroid shift after ten iterations or if the centroid has moved at least 1 pixel. That first flag (EPS
or CRITERIA_COUNT
) indicates we're going to use either of the two criteria (count or "epsilon", meaning the minimum movement).
Now that we have a histogram calculated, and termination criteria for Meanshift, we can start our usual infinite loop, grab the current frame from the camera, and start processing it. The first thing we do is switch to HSV color space:
if ret == True: hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
Now that we have an HSV array, we can operate the long awaited histogram back projection:
dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
The result of calcBackProject
is a matrix. If you printed it to console, it looks more or less like this:
[[ 0 0 0 ..., 0 0 0] [ 0 0 0 ..., 0 0 0] [ 0 0 0 ..., 0 0 0] ..., [ 0 0 20 ..., 0 0 0] [ 78 20 0 ..., 0 0 0] [255 137 20 ..., 0 0 0]]
Each pixel is represented with its probability.
This matrix can the finally be passed into Meanshift, together with the track window and the termination criteria as outlined by the Python signature of cv2.meanShift
:
meanShift(...) meanShift(probImage, window, criteria) -> retval, window
So here it is:
ret, track_window = cv2.meanShift(dst, track_window, term_crit)
Finally, we calculate the new coordinates of the window, draw a rectangle to display it in the frame, and then show it:
x,y,w,h = track_window img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2) cv2.imshow('img2',img2)
That's it. You should by now have a good idea of color histograms, back projections, and Meanshift. However, there remains one issue to be resolved with the preceding program: the size of the window does not change with the size of the object in the frames being tracked.
One of the authorities in computer vision and author of the seminal book, Learning OpenCV, Gary Bradski, O'Reilly, published a paper in 1988 to improve the accuracy of Meanshift, and described a new algorithm called Continuously Adaptive Meanshift (CAMShift), which is very similar to Meanshift but also adapts the size of the track window when Meanshift reaches convergence.
3.128.173.53