The Kalman filter

The Kalman filter is an algorithm mainly (but not only) developed by Rudolf Kalman in the late 1950s, and has found practical application in many fields, particularly navigation systems for all sorts of vehicles from nuclear submarines to aircrafts.

The Kalman filter operates recursively on streams of noisy input data (which in computer vision is normally a video feed) to produce a statistically optimal estimate of the underlying system state (the position inside the video).

Let's take a quick example to conceptualize the Kalman filter and translate the preceding (purposely broad and generic) definition into plainer English. Think of a small red ball on a table, and imagine you have a camera pointing at the scene. You identify the ball as the subject to be tracked, and flick it with your fingers. The ball will start rolling on the table, following the laws of motion we're familiar with.

If the ball is rolling at a speed of 1 meter per second (1 m/s) in a particular direction, you don't need the Kalman filter to estimate where the ball will be in 1 second's time: it will be 1 meter away. The Kalman filter applies these laws to predict an object's position in the current video frame based on observations gathered in the previous frames. Naturally, the Kalman filter cannot know about a pencil on the table deflecting the course of the ball, but it can adjust for this kind of unforeseeable event.

Predict and update

From the preceding description, we gather that the Kalman filter algorithm is divided into two phases:

  • Predict: In the first phase, the Kalman filter uses the covariance calculated up to the current point in time to estimate the object's new position
  • Update: In the second phase, it records the object's position and adjusts the covariance for the next cycle of calculations

This adjustment is—in OpenCV terms—a correction, hence the API of the KalmanFilter class in the Python bindings of OpenCV is as follows:

 class KalmanFilter(__builtin__.object)
 |  Methods defined here:
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  correct(...)
 |      correct(measurement) -> retval
 |  
 |  predict(...)
 |      predict([, control]) -> retval

We can deduce that, in our programs, we will call predict() to estimate the position of an object, and correct() to instruct the Kalman filter to adjust its calculations.

An example

Ultimately, we will aim to use the Kalman filter in combination with CAMShift to obtain the highest degree of accuracy and performance. However, before we go into such levels of complexity, let's analyze a simple example, specifically one that seems to be very common on the Web when it comes to the Kalman filter and OpenCV: mouse tracking.

In the following example, we will draw an empty frame and two lines: one corresponding to the actual movement of the mouse, and the other corresponding to the Kalman filter prediction. Here's the code:

import cv2
import numpy as np

frame = np.zeros((800, 800, 3), np.uint8)
last_measurement = current_measurement = np.array((2,1), np.float32) 
last_prediction = current_prediction = np.zeros((2,1), np.float32)

def mousemove(event, x, y, s, p):
    global frame, current_measurement, measurements, last_measurement, current_prediction, last_prediction
    last_prediction = current_prediction
    last_measurement = current_measurement
    current_measurement = np.array([[np.float32(x)],[np.float32(y)]])
    kalman.correct(current_measurement)
    current_prediction = kalman.predict()
    lmx, lmy = last_measurement[0], last_measurement[1]
    cmx, cmy = current_measurement[0], current_measurement[1]
    lpx, lpy = last_prediction[0], last_prediction[1]
    cpx, cpy = current_prediction[0], current_prediction[1]
    cv2.line(frame, (lmx, lmy), (cmx, cmy), (0,100,0))
    cv2.line(frame, (lpx, lpy), (cpx, cpy), (0,0,200))


cv2.namedWindow("kalman_tracker")
cv2.setMouseCallback("kalman_tracker", mousemove)

kalman = cv2.KalmanFilter(4,2)
kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32)
kalman.transitionMatrix = np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]],np.float32)
kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03

while True:
    cv2.imshow("kalman_tracker", frame)
    if (cv2.waitKey(30) & 0xFF) == 27:
        break

cv2.destroyAllWindows()

As usual, let's analyze it step by step. After the packages import, we create an empty frame, of size 800 x 800, and then initialize the arrays that will take the coordinates of the measurements and predictions of the mouse movements:

frame = np.zeros((800, 800, 3), np.uint8)
last_measurement = current_measurement = np.array((2,1), np.float32) 
last_prediction = current_prediction = np.zeros((2,1), np.float32)

Then, we declare the mouse move Callback function, which is going to handle the drawing of the tracking. The mechanism is quite simple; we store the last measurements and last prediction, correct the Kalman with the current measurement, calculate the Kalman prediction, and finally draw two lines, from the last measurement to the current and from the last prediction to the current:

def mousemove(event, x, y, s, p):
    global frame, current_measurement, measurements, last_measurement, current_prediction, last_prediction
    last_prediction = current_prediction
    last_measurement = current_measurement
    current_measurement = np.array([[np.float32(x)],[np.float32(y)]])
    kalman.correct(current_measurement)
    current_prediction = kalman.predict()
    lmx, lmy = last_measurement[0], last_measurement[1]
    cmx, cmy = current_measurement[0], current_measurement[1]
    lpx, lpy = last_prediction[0], last_prediction[1]
    cpx, cpy = current_prediction[0], current_prediction[1]
    cv2.line(frame, (lmx, lmy), (cmx, cmy), (0,100,0))
    cv2.line(frame, (lpx, lpy), (cpx, cpy), (0,0,200))

The next step is to initialize the window and set the Callback function. OpenCV handles mouse events with the setMouseCallback function; specific events must be handled using the first parameter of the Callback (event) function that determines what kind of event has been triggered (click, move, and so on):

cv2.namedWindow("kalman_tracker")
cv2.setMouseCallback("kalman_tracker", mousemove)

Now we're ready to create the Kalman filter:

kalman = cv2.KalmanFilter(4,2)
kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32)
kalman.transitionMatrix = np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]],np.float32)
kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03

The Kalman filter class takes optional parameters in its constructor (from the OpenCV documentation):

  • dynamParams: This parameter states the dimensionality of the state
  • MeasureParams: This parameter states the dimensionality of the measurement
  • ControlParams: This parameter states the dimensionality of the control
  • vector.type: This parameter states the type of the created matrices that should be CV_32F or CV_64F

I found the preceding parameters (both for the constructor and the Kalman properties) to work very well.

From this point on, the program is straightforward; every mouse movement triggers a Kalman prediction, both the actual position of the mouse and the Kalman prediction are drawn in the frame, which is continuously displayed. If you move your mouse around, you'll notice that, if you make a sudden turn at high speed, the prediction line will have a wider trajectory, which is consistent with the momentum of the mouse movement at the time. Here's a sample result:

An example

A real-life example – tracking pedestrians

Up to this point, we have familiarized ourselves with the concepts of motion detection, object detection, and object tracking, so I imagine you are anxious to put this newfound knowledge to good use in a real-life scenario. Let's do just that by examining the video feed of a surveillance camera and tracking pedestrians in it.

First of all, we need a sample video; if you download the OpenCV source, you will find the perfect video file for this purpose in <opencv_dir>/samples/data/768x576.avi.

Now that we have the perfect asset to analyze, let's start building the application.

The application workflow

The application will adhere to the following logic:

  1. Examine the first frame.
  2. Examine the following frames and perform background subtraction to identify pedestrians in the scene at the start of the scene.
  3. Establish an ROI per pedestrian, and use Kalman/CAMShift to track giving an ID to each pedestrian.
  4. Examine the next frames for new pedestrians entering the scene.

If this were a real-world application, you would probably store pedestrian information to obtain information such as the average permanence of a pedestrian in the scene and most likely routes. However, this is all beyond the remit of this example application.

In a real-world application, you would make sure to identify new pedestrians entering the scene, but for now, we'll focus on tracking those objects that are in the scene at the start of the video, utilizing the CAMShift and Kalman filter algorithms.

You will find the code for this application in chapter8/surveillance_demo/ of the code repository.

A brief digression – functional versus object-oriented programming

Although most programmers are either familiar (or work on a constant basis) with Object-oriented Programming (OOP), I have found that, the more the years pass, the more I prefer Functional Programming (FP) solutions.

For those not familiar with the terminology, FP is a programming paradigm adopted by many languages that treats programs as the evaluation of mathematical functions, allows functions to return functions, and permits functions as arguments in a function. The strength of FP does not only reside in what it can do, but also in what it can avoid, or aims at avoiding side-effects and changing states. If the topic of functional programming has sparked an interest, make sure to check out languages such as Haskell, Clojure, or ML.

Note

What is a side-effect in programming terms? You can define a side effect as any function that changes any value that does not depend on the function's input. Python, along with many other languages, is susceptible to causing side-effects because—much like, for example, JavaScript—it allows access to global variables (and sometimes this access to global variables can be accidental!).

Another major issue encountered with languages that are not purely functional is the fact that a function's result will change over time, depending on the state of the variables involved. If a function takes an object as an argument—for example—and the computation relies on the internal state of that object, the function will return different results according to the changes in the object's state. This is something that very typically happens in languages, such as C and C++, in functions where one or more of the arguments are references to objects.

Why this digression? Because so far I have illustrated concepts using mostly functions; I did not shy away from accessing global variables where this was the simplest and most robust approach. However, the next program we will examine will contain OOP. So why do I choose to adopt OOP while advocating FP? Because OpenCV has quite an opinionated approach, which makes it hard to implement a program with a purely functional or object-oriented approach.

For example, any drawing function, such as cv2.rectangle and cv2.circle, modifies the argument passed into it. This approach contravenes one of the cardinal rules of functional programming, which is to avoid side-effects and changing states.

Out of curiosity, you could—in Python—redeclare the API of these drawing functions in a way that is more FP-friendly. For example, you could rewrite cv2.rectangle like this:

def drawRect(frame, topLeft, bottomRight, color, thickness, fill = cv2.LINE_AA):
    newframe = frame.copy()
    cv2.rectangle(newframe, topLeft, bottomRight, color, thickness, fill)
    return newframe

This approach—while computationally more expensive due to the copy() operation—allows the explicit reassignment of a frame, like so:

frame = camera.read()
frame = drawRect(frame, (0,0), (10,10), (0, 255,0), 1)

To conclude this digression, I will reiterate a belief very often mentioned in all programming forums and resources: there is no such thing as the best language or paradigm, only the best tool for the job in hand.

So let's get back to our program and explore the implementation of a surveillance application, tracking moving objects in a video.

The Pedestrian class

The main rationale behind the creation of a Pedestrian class is the nature of the Kalman filter. The Kalman filter can predict the position of an object based on historical observations and correct the prediction based on the actual data, but it can only do that for one object.

As a consequence, we need one Kalman filter per object tracked.

So the Pedestrian class will act as a holder for a Kalman filter, a color histogram (calculated on the first detection of the object and used as a reference for the subsequent frames), and information about the region of interest, which will be used by the CAMShift algorithm (the track_window parameter).

Furthermore, we store the ID of each pedestrian for some fancy real-time info.

Let's take a look at the Pedestrian class:

class Pedestrian():
  """Pedestrian class

  each pedestrian is composed of a ROI, an ID and a Kalman filter
  so we create a Pedestrian class to hold the object state
  """
  def __init__(self, id, frame, track_window):
    """init the pedestrian object with track window coordinates"""
    # set up the roi
    self.id = int(id)
    x,y,w,h = track_window
    self.track_window = track_window
    self.roi = cv2.cvtColor(frame[y:y+h, x:x+w], cv2.COLOR_BGR2HSV)
    roi_hist = cv2.calcHist([self.roi], [0], None, [16], [0, 180])
    self.roi_hist = cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

    # set up the kalman
    self.kalman = cv2.KalmanFilter(4,2)
    self.kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32)
    self.kalman.transitionMatrix = np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]],np.float32)
    self.kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03
    self.measurement = np.array((2,1), np.float32) 
    self.prediction = np.zeros((2,1), np.float32)
    self.term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
    self.center = None
    self.update(frame)
    
  def __del__(self):
    print "Pedestrian %d destroyed" % self.id

  def update(self, frame):
    # print "updating %d " % self.id
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    back_project = cv2.calcBackProject([hsv],[0], self.roi_hist,[0,180],1)
    
    if args.get("algorithm") == "c":
      ret, self.track_window = cv2.CamShift(back_project, self.track_window, self.term_crit)
      pts = cv2.boxPoints(ret)
      pts = np.int0(pts)
      self.center = center(pts)
      cv2.polylines(frame,[pts],True, 255,1)
      
    if not args.get("algorithm") or args.get("algorithm") == "m":
      ret, self.track_window = cv2.meanShift(back_project, self.track_window, self.term_crit)
      x,y,w,h = self.track_window
      self.center = center([[x,y],[x+w, y],[x,y+h],[x+w, y+h]])  
      cv2.rectangle(frame, (x,y), (x+w, y+h), (255, 255, 0), 1)

    self.kalman.correct(self.center)
    prediction = self.kalman.predict()
    cv2.circle(frame, (int(prediction[0]), int(prediction[1])), 4, (0, 255, 0), -1)
    # fake shadow
    cv2.putText(frame, "ID: %d -> %s" % (self.id, self.center), (11, (self.id + 1) * 25 + 1),
        font, 0.6,
        (0, 0, 0),
        1,
        cv2.LINE_AA)
    # actual info
    cv2.putText(frame, "ID: %d -> %s" % (self.id, self.center), (10, (self.id + 1) * 25),
        font, 0.6,
        (0, 255, 0),
        1,
        cv2.LINE_AA)

At the core of the program lies the background subtractor object, which lets us identify regions of interest corresponding to moving objects.

When the program starts, we take each of these regions and instantiate a Pedestrian class, passing the ID (a simple counter), and the frame and track window coordinates (so we can extract the Region of Interest (ROI), and, from this, the HSV histogram of the ROI).

The constructor function (__init__ in Python) is more or less an aggregation of all the previous concepts: given an ROI, we calculate its histogram, set up a Kalman filter, and associate it to a property (self.kalman) of the object.

In the update method, we pass the current frame and convert it to HSV so that we can calculate the back projection of the pedestrian's HSV histogram.

We then use either CAMShift or Meanshift (depending on the argument passed; Meanshift is the default if no arguments are passed) to track the movement of the pedestrian, and correct the Kalman filter for that pedestrian with the actual position.

We also draw both CAMShift/Meanshift (with a surrounding rectangle) and Kalman (with a dot), so you can observe Kalman and CAMShift/Meanshift go nearly hand in hand, except for sudden movements that cause Kalman to have to readjust.

Lastly, we print some pedestrian information on the top-left corner of the image.

The main program

Now that we have a Pedestrian class holding all specific information for each object, let's take a look at the main function in the program.

First, we load a video (it could be a webcam), and then we initialize a background subtractor, setting 20 frames as the frames affecting the background model:

history = 20
bs = cv2.createBackgroundSubtractorKNN(detectShadows = True)
bs.setHistory(history)

We also create the main display window, and then set up a pedestrians dictionary and a firstFrame flag, which we're going to use to allow a few frames for the background subtractor to build history, so it can better identify moving objects. To help with this, we also set up a frame counter:

cv2.namedWindow("surveillance")
  pedestrians = {}
  firstFrame = True
  frames = 0

Now we start the loop. We read camera frames (or video frames) one by one:

while True:
    print " -------------------- FRAME %d --------------------" % frames
    grabbed, frane = camera.read()
    if (grabbed is False):
      print "failed to grab frame."
      break

    ret, frame = camera.read()

We let BackgroundSubtractorKNN build the history for the background model, so we don't actually process the first 20 frames; we only pass them into the subtractor:

    fgmask = bs.apply(frame)
    # this is just to let the background subtractor build a bit of history
    if frames < history:
      frames += 1
      continue

Then we process the frame with the approach explained earlier in the chapter, by applying a process of dilation and erosion on the foreground mask so as to obtain easily identifiable blobs and their bounding boxes. These are obviously moving objects in the frame:

    th = cv2.threshold(fgmask.copy(), 127, 255, cv2.THRESH_BINARY)[1]
    th = cv2.erode(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)), iterations = 2)
    dilated = cv2.dilate(th, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (8,3)), iterations = 2)
    image, contours, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Once the contours are identified, we instantiate one pedestrian per contour for the first frame only (note that I set a minimum area for the contour to further denoise our detection):

    counter = 0
    for c in contours:
      if cv2.contourArea(c) > 500:
        (x,y,w,h) = cv2.boundingRect(c)
        cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 255, 0), 1)
        # only create pedestrians in the first frame, then just follow the ones you have
        if firstFrame is True:
          pedestrians[counter] = Pedestrian(counter, frame, (x,y,w,h))
        counter += 1

Then, for each pedestrian detected, we perform an update method passing the current frame, which is needed in its original color space, because the pedestrian objects are responsible for drawing their own information (text and Meanshift/CAMShift rectangles, and Kalman filter tracking):

    for i, p in pedestrians.iteritems():
        p.update(frame)

We set the firstFrame flag to False, so we don't instantiate any more pedestrians; we just keep track of the ones we have:

    firstFrame = False
    frames += 1

Finally, we show the result in the display window. The program can be exited by pressing the Esc key:

    cv2.imshow("surveillance", frame)
    if cv2.waitKey(110) & 0xff == 27:
        break

if __name__ == "__main__":
  main()

There you have it: CAMShift/Meanshift working in tandem with the Kalman filter to track moving objects. All being well, you should obtain a result similar to this:

The main program

In this screenshot, the blue rectangle is the CAMShift detection and the green rectangle is the Kalman filter prediction with its center at the blue circle.

Where do we go from here?

This program constitutes a basis for your application domain's needs. There are many improvements that can be made building on the program above to suit an application's additional requirements. Consider the following examples:

  • You could destroy a pedestrian object if Kalman predicts its position to be outside the frame
  • You could check whether each detected moving object is corresponding to existing pedestrian instances, and if not, create an instance for it
  • You could train an SVM and operate classification on each moving object to establish whether or not the moving object is of the nature you intend to track (for instance, a dog might enter the scene but your application requires to only track humans)

Whatever your needs, hopefully this chapter will have provided you with the necessary knowledge to build applications that satisfy your requirements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.48.204