The Kalman filter is an algorithm mainly (but not only) developed by Rudolf Kalman in the late 1950s, and has found practical application in many fields, particularly navigation systems for all sorts of vehicles from nuclear submarines to aircrafts.
The Kalman filter operates recursively on streams of noisy input data (which in computer vision is normally a video feed) to produce a statistically optimal estimate of the underlying system state (the position inside the video).
Let's take a quick example to conceptualize the Kalman filter and translate the preceding (purposely broad and generic) definition into plainer English. Think of a small red ball on a table, and imagine you have a camera pointing at the scene. You identify the ball as the subject to be tracked, and flick it with your fingers. The ball will start rolling on the table, following the laws of motion we're familiar with.
If the ball is rolling at a speed of 1 meter per second (1 m/s) in a particular direction, you don't need the Kalman filter to estimate where the ball will be in 1 second's time: it will be 1 meter away. The Kalman filter applies these laws to predict an object's position in the current video frame based on observations gathered in the previous frames. Naturally, the Kalman filter cannot know about a pencil on the table deflecting the course of the ball, but it can adjust for this kind of unforeseeable event.
From the preceding description, we gather that the Kalman filter algorithm is divided into two phases:
This adjustment is—in OpenCV terms—a correction, hence the API of the KalmanFilter
class in the Python bindings of OpenCV is as follows:
class KalmanFilter(__builtin__.object) | Methods defined here: | | __repr__(...) | x.__repr__() <==> repr(x) | | correct(...) | correct(measurement) -> retval | | predict(...) | predict([, control]) -> retval
We can deduce that, in our programs, we will call predict()
to estimate the position of an object, and correct()
to instruct the Kalman filter to adjust its calculations.
Ultimately, we will aim to use the Kalman filter in combination with CAMShift to obtain the highest degree of accuracy and performance. However, before we go into such levels of complexity, let's analyze a simple example, specifically one that seems to be very common on the Web when it comes to the Kalman filter and OpenCV: mouse tracking.
In the following example, we will draw an empty frame and two lines: one corresponding to the actual movement of the mouse, and the other corresponding to the Kalman filter prediction. Here's the code:
import cv2 import numpy as np frame = np.zeros((800, 800, 3), np.uint8) last_measurement = current_measurement = np.array((2,1), np.float32) last_prediction = current_prediction = np.zeros((2,1), np.float32) def mousemove(event, x, y, s, p): global frame, current_measurement, measurements, last_measurement, current_prediction, last_prediction last_prediction = current_prediction last_measurement = current_measurement current_measurement = np.array([[np.float32(x)],[np.float32(y)]]) kalman.correct(current_measurement) current_prediction = kalman.predict() lmx, lmy = last_measurement[0], last_measurement[1] cmx, cmy = current_measurement[0], current_measurement[1] lpx, lpy = last_prediction[0], last_prediction[1] cpx, cpy = current_prediction[0], current_prediction[1] cv2.line(frame, (lmx, lmy), (cmx, cmy), (0,100,0)) cv2.line(frame, (lpx, lpy), (cpx, cpy), (0,0,200)) cv2.namedWindow("kalman_tracker") cv2.setMouseCallback("kalman_tracker", mousemove) kalman = cv2.KalmanFilter(4,2) kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32) kalman.transitionMatrix = np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]],np.float32) kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03 while True: cv2.imshow("kalman_tracker", frame) if (cv2.waitKey(30) & 0xFF) == 27: break cv2.destroyAllWindows()
As usual, let's analyze it step by step. After the packages import, we create an empty frame, of size 800 x 800, and then initialize the arrays that will take the coordinates of the measurements and predictions of the mouse movements:
frame = np.zeros((800, 800, 3), np.uint8) last_measurement = current_measurement = np.array((2,1), np.float32) last_prediction = current_prediction = np.zeros((2,1), np.float32)
Then, we declare the mouse move Callback
function, which is going to handle the drawing of the tracking. The mechanism is quite simple; we store the last measurements and last prediction, correct the Kalman with the current measurement, calculate the Kalman prediction, and finally draw two lines, from the last measurement to the current and from the last prediction to the current:
def mousemove(event, x, y, s, p): global frame, current_measurement, measurements, last_measurement, current_prediction, last_prediction last_prediction = current_prediction last_measurement = current_measurement current_measurement = np.array([[np.float32(x)],[np.float32(y)]]) kalman.correct(current_measurement) current_prediction = kalman.predict() lmx, lmy = last_measurement[0], last_measurement[1] cmx, cmy = current_measurement[0], current_measurement[1] lpx, lpy = last_prediction[0], last_prediction[1] cpx, cpy = current_prediction[0], current_prediction[1] cv2.line(frame, (lmx, lmy), (cmx, cmy), (0,100,0)) cv2.line(frame, (lpx, lpy), (cpx, cpy), (0,0,200))
The next step is to initialize the window and set the Callback
function. OpenCV handles mouse events with the setMouseCallback
function; specific events must be handled using the first parameter of the Callback
(event) function that determines what kind of event has been triggered (click, move, and so on):
cv2.namedWindow("kalman_tracker") cv2.setMouseCallback("kalman_tracker", mousemove)
Now we're ready to create the Kalman filter:
kalman = cv2.KalmanFilter(4,2) kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32) kalman.transitionMatrix = np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]],np.float32) kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03
The Kalman filter class takes optional parameters in its constructor (from the OpenCV documentation):
dynamParams
: This parameter states the dimensionality of the stateMeasureParams
: This parameter states the dimensionality of the measurementControlParams
: This parameter states the dimensionality of the controlvector.type
: This parameter states the type of the created matrices that should be CV_32F
or CV_64F
I found the preceding parameters (both for the constructor and the Kalman properties) to work very well.
From this point on, the program is straightforward; every mouse movement triggers a Kalman prediction, both the actual position of the mouse and the Kalman prediction are drawn in the frame, which is continuously displayed. If you move your mouse around, you'll notice that, if you make a sudden turn at high speed, the prediction line will have a wider trajectory, which is consistent with the momentum of the mouse movement at the time. Here's a sample result:
Up to this point, we have familiarized ourselves with the concepts of motion detection, object detection, and object tracking, so I imagine you are anxious to put this newfound knowledge to good use in a real-life scenario. Let's do just that by examining the video feed of a surveillance camera and tracking pedestrians in it.
First of all, we need a sample video; if you download the OpenCV source, you will find the perfect video file for this purpose in <opencv_dir>/samples/data/768x576.avi
.
Now that we have the perfect asset to analyze, let's start building the application.
The application will adhere to the following logic:
If this were a real-world application, you would probably store pedestrian information to obtain information such as the average permanence of a pedestrian in the scene and most likely routes. However, this is all beyond the remit of this example application.
In a real-world application, you would make sure to identify new pedestrians entering the scene, but for now, we'll focus on tracking those objects that are in the scene at the start of the video, utilizing the CAMShift and Kalman filter algorithms.
You will find the code for this application in chapter8/surveillance_demo/
of the code repository.
Although most programmers are either familiar (or work on a constant basis) with Object-oriented Programming (OOP), I have found that, the more the years pass, the more I prefer Functional Programming (FP) solutions.
For those not familiar with the terminology, FP is a programming paradigm adopted by many languages that treats programs as the evaluation of mathematical functions, allows functions to return functions, and permits functions as arguments in a function. The strength of FP does not only reside in what it can do, but also in what it can avoid, or aims at avoiding side-effects and changing states. If the topic of functional programming has sparked an interest, make sure to check out languages such as Haskell, Clojure, or ML.
What is a side-effect in programming terms? You can define a side effect as any function that changes any value that does not depend on the function's input. Python, along with many other languages, is susceptible to causing side-effects because—much like, for example, JavaScript—it allows access to global variables (and sometimes this access to global variables can be accidental!).
Another major issue encountered with languages that are not purely functional is the fact that a function's result will change over time, depending on the state of the variables involved. If a function takes an object as an argument—for example—and the computation relies on the internal state of that object, the function will return different results according to the changes in the object's state. This is something that very typically happens in languages, such as C and C++, in functions where one or more of the arguments are references to objects.
Why this digression? Because so far I have illustrated concepts using mostly functions; I did not shy away from accessing global variables where this was the simplest and most robust approach. However, the next program we will examine will contain OOP. So why do I choose to adopt OOP while advocating FP? Because OpenCV has quite an opinionated approach, which makes it hard to implement a program with a purely functional or object-oriented approach.
For example, any drawing function, such as cv2.rectangle
and cv2.circle
, modifies the argument passed into it. This approach contravenes one of the cardinal rules of functional programming, which is to avoid side-effects and changing states.
Out of curiosity, you could—in Python—redeclare the API of these drawing functions in a way that is more FP-friendly. For example, you could rewrite cv2.rectangle
like this:
def drawRect(frame, topLeft, bottomRight, color, thickness, fill = cv2.LINE_AA): newframe = frame.copy() cv2.rectangle(newframe, topLeft, bottomRight, color, thickness, fill) return newframe
This approach—while computationally more expensive due to the copy()
operation—allows the explicit reassignment of a frame, like so:
frame = camera.read() frame = drawRect(frame, (0,0), (10,10), (0, 255,0), 1)
To conclude this digression, I will reiterate a belief very often mentioned in all programming forums and resources: there is no such thing as the best language or paradigm, only the best tool for the job in hand.
So let's get back to our program and explore the implementation of a surveillance application, tracking moving objects in a video.
The main rationale behind the creation of a Pedestrian
class is the nature of the Kalman filter. The Kalman filter can predict the position of an object based on historical observations and correct the prediction based on the actual data, but it can only do that for one object.
As a consequence, we need one Kalman filter per object tracked.
So the Pedestrian
class will act as a holder for a Kalman filter, a color histogram (calculated on the first detection of the object and used as a reference for the subsequent frames), and information about the region of interest, which will be used by the CAMShift algorithm (the track_window
parameter).
Furthermore, we store the ID of each pedestrian for some fancy real-time info.
Let's take a look at the Pedestrian
class:
class Pedestrian(): """Pedestrian class each pedestrian is composed of a ROI, an ID and a Kalman filter so we create a Pedestrian class to hold the object state """ def __init__(self, id, frame, track_window): """init the pedestrian object with track window coordinates""" # set up the roi self.id = int(id) x,y,w,h = track_window self.track_window = track_window self.roi = cv2.cvtColor(frame[y:y+h, x:x+w], cv2.COLOR_BGR2HSV) roi_hist = cv2.calcHist([self.roi], [0], None, [16], [0, 180]) self.roi_hist = cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX) # set up the kalman self.kalman = cv2.KalmanFilter(4,2) self.kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32) self.kalman.transitionMatrix = np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]],np.float32) self.kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03 self.measurement = np.array((2,1), np.float32) self.prediction = np.zeros((2,1), np.float32) self.term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) self.center = None self.update(frame) def __del__(self): print "Pedestrian %d destroyed" % self.id def update(self, frame): # print "updating %d " % self.id hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) back_project = cv2.calcBackProject([hsv],[0], self.roi_hist,[0,180],1) if args.get("algorithm") == "c": ret, self.track_window = cv2.CamShift(back_project, self.track_window, self.term_crit) pts = cv2.boxPoints(ret) pts = np.int0(pts) self.center = center(pts) cv2.polylines(frame,[pts],True, 255,1) if not args.get("algorithm") or args.get("algorithm") == "m": ret, self.track_window = cv2.meanShift(back_project, self.track_window, self.term_crit) x,y,w,h = self.track_window self.center = center([[x,y],[x+w, y],[x,y+h],[x+w, y+h]]) cv2.rectangle(frame, (x,y), (x+w, y+h), (255, 255, 0), 1) self.kalman.correct(self.center) prediction = self.kalman.predict() cv2.circle(frame, (int(prediction[0]), int(prediction[1])), 4, (0, 255, 0), -1) # fake shadow cv2.putText(frame, "ID: %d -> %s" % (self.id, self.center), (11, (self.id + 1) * 25 + 1), font, 0.6, (0, 0, 0), 1, cv2.LINE_AA) # actual info cv2.putText(frame, "ID: %d -> %s" % (self.id, self.center), (10, (self.id + 1) * 25), font, 0.6, (0, 255, 0), 1, cv2.LINE_AA)
At the core of the program lies the background subtractor object, which lets us identify regions of interest corresponding to moving objects.
When the program starts, we take each of these regions and instantiate a Pedestrian
class, passing the ID (a simple counter), and the frame and track window coordinates (so we can extract the
Region of Interest (ROI), and, from this, the HSV histogram of the ROI).
The constructor function (__init__
in Python) is more or less an aggregation of all the previous concepts: given an ROI, we calculate its histogram, set up a Kalman filter, and associate it to a property (self.kalman
) of the object.
In the update
method, we pass the current frame and convert it to HSV so that we can calculate the back projection of the pedestrian's HSV histogram.
We then use either CAMShift or Meanshift (depending on the argument passed; Meanshift is the default if no arguments are passed) to track the movement of the pedestrian, and correct the Kalman filter for that pedestrian with the actual position.
We also draw both CAMShift/Meanshift (with a surrounding rectangle) and Kalman (with a dot), so you can observe Kalman and CAMShift/Meanshift go nearly hand in hand, except for sudden movements that cause Kalman to have to readjust.
Lastly, we print some pedestrian information on the top-left corner of the image.
Now that we have a Pedestrian
class holding all specific information for each object, let's take a look at the main function in the program.
First, we load a video (it could be a webcam), and then we initialize a background subtractor, setting 20 frames as the frames affecting the background model:
history = 20 bs = cv2.createBackgroundSubtractorKNN(detectShadows = True) bs.setHistory(history)
We also create the main display window, and then set up a pedestrians dictionary and a firstFrame
flag, which we're going to use to allow a few frames for the background subtractor to build history, so it can better identify moving objects. To help with this, we also set up a frame counter:
cv2.namedWindow("surveillance") pedestrians = {} firstFrame = True frames = 0
Now we start the loop. We read camera frames (or video frames) one by one:
while True: print " -------------------- FRAME %d --------------------" % frames grabbed, frane = camera.read() if (grabbed is False): print "failed to grab frame." break ret, frame = camera.read()
We let BackgroundSubtractorKNN
build the history for the background model, so we don't actually process the first 20 frames; we only pass them into the subtractor:
fgmask = bs.apply(frame) # this is just to let the background subtractor build a bit of history if frames < history: frames += 1 continue
Then we process the frame with the approach explained earlier in the chapter, by applying a process of dilation and erosion on the foreground mask so as to obtain easily identifiable blobs and their bounding boxes. These are obviously moving objects in the frame:
th = cv2.threshold(fgmask.copy(), 127, 255, cv2.THRESH_BINARY)[1] th = cv2.erode(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)), iterations = 2) dilated = cv2.dilate(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (8,3)), iterations = 2) image, contours, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Once the contours are identified, we instantiate one pedestrian per contour for the first frame only (note that I set a minimum area for the contour to further denoise our detection):
counter = 0 for c in contours: if cv2.contourArea(c) > 500: (x,y,w,h) = cv2.boundingRect(c) cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 255, 0), 1) # only create pedestrians in the first frame, then just follow the ones you have if firstFrame is True: pedestrians[counter] = Pedestrian(counter, frame, (x,y,w,h)) counter += 1
Then, for each pedestrian detected, we perform an update
method passing the current frame, which is needed in its original color space, because the pedestrian objects are responsible for drawing their own information (text and Meanshift/CAMShift rectangles, and Kalman filter tracking):
for i, p in pedestrians.iteritems(): p.update(frame)
We set the firstFrame
flag to False
, so we don't instantiate any more pedestrians; we just keep track of the ones we have:
firstFrame = False frames += 1
Finally, we show the result in the display window. The program can be exited by pressing the Esc key:
cv2.imshow("surveillance", frame) if cv2.waitKey(110) & 0xff == 27: break if __name__ == "__main__": main()
There you have it: CAMShift/Meanshift working in tandem with the Kalman filter to track moving objects. All being well, you should obtain a result similar to this:
In this screenshot, the blue rectangle is the CAMShift detection and the green rectangle is the Kalman filter prediction with its center at the blue circle.
This program constitutes a basis for your application domain's needs. There are many improvements that can be made building on the program above to suit an application's additional requirements. Consider the following examples:
Whatever your needs, hopefully this chapter will have provided you with the necessary knowledge to build applications that satisfy your requirements.
3.144.48.204