Detecting cars

There is no virtual limit to the type of objects you can detect in your images and videos. However, to obtain an acceptable level of accuracy, you need a sufficiently large dataset, containing train images that are identical in size.

This would be a time consuming operation if we were to do it all by ourselves (which is entirely possible).

We can avail of ready-made datasets; there are a number of them freely downloadable from various sources:

Note

Note that training images and test images are available in separate files.

I'll be using the UIUC dataset in my example, but feel free to explore the Internet for other types of datasets.

Now, let's take a look at an example:

import cv2
import numpy as np
from os.path import join

datapath = "/home/d3athmast3r/dev/python/CarData/TrainImages/"
def path(cls,i):
  return "%s/%s%d.pgm"  % (datapath,cls,i+1)

pos, neg = "pos-", "neg-"

detect = cv2.xfeatures2d.SIFT_create()
extract = cv2.xfeatures2d.SIFT_create()

flann_params = dict(algorithm = 1, trees = 5)flann = cv2.FlannBasedMatcher(flann_params, {})

bow_kmeans_trainer = cv2.BOWKMeansTrainer(40)
extract_bow = cv2.BOWImgDescriptorExtractor(extract, flann)

def extract_sift(fn):
  im = cv2.imread(fn,0)
  return extract.compute(im, detect.detect(im))[1]
  
for i in range(8):
  bow_kmeans_trainer.add(extract_sift(path(pos,i)))
  bow_kmeans_trainer.add(extract_sift(path(neg,i)))
  
voc = bow_kmeans_trainer.cluster()
extract_bow.setVocabulary( voc )

def bow_features(fn):
  im = cv2.imread(fn,0)
  return extract_bow.compute(im, detect.detect(im))

traindata, trainlabels = [],[]
for i in range(20):
  traindata.extend(bow_features(path(pos, i))); trainlabels.append(1)
  traindata.extend(bow_features(path(neg, i))); trainlabels.append(-1)

svm = cv2.ml.SVM_create()
svm.train(np.array(traindata), cv2.ml.ROW_SAMPLE, np.array(trainlabels))

def predict(fn):
  f = bow_features(fn);  
  p = svm.predict(f)
  print fn, "	", p[1][0][0]
  return p
  
car, notcar = "/home/d3athmast3r/dev/python/study/images/car.jpg", "/home/d3athmast3r/dev/python/study/images/bb.jpg"
car_img = cv2.imread(car)
notcar_img = cv2.imread(notcar)
car_predict = predict(car)
not_car_predict = predict(notcar)

font = cv2.FONT_HERSHEY_SIMPLEX

if (car_predict[1][0][0] == 1.0):
  cv2.putText(car_img,'Car Detected',(10,30), font, 1,(0,255,0),2,cv2.LINE_AA)

if (not_car_predict[1][0][0] == -1.0):
  cv2.putText(notcar_img,'Car Not Detected',(10,30), font, 1,(0,0, 255),2,cv2.LINE_AA)

cv2.imshow('BOW + SVM Success', car_img)
cv2.imshow('BOW + SVM Failure', notcar_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

What did we just do?

This is quite a lot to assimilate, so let's go through what we've done:

  1. First of all, our usual imports are followed by the declaration of the base path of our training images. This will come in handy to avoid rewriting the base path every time we process an image in a particular folder on our computer.
  2. After this, we declare a function, path:
    def path(cls,i):
      return "%s/%s%d.pgm"  % (datapath,cls,i+1)
    
    pos, neg = "pos-", "neg-"

    Note

    More on the path function

    This function is a utility method: given the name of a class (in our case, we have two classes, pos and neg) and a numerical index, we return the full path to a particular testing image. Our car dataset contains images named in the following way: pos-x.pgm and neg-x.pgm, where x is a number.

    Immediately, you will find the usefulness of this function when iterating through a range of numbers (say, 20), which will allow you to load all images from pos-0.pgm to pos-20.pgm, and the same goes for the negative class.

  3. Next up, we'll create two SIFT instances: one to extract keypoints, the other to extract features:
    detect = cv2.xfeatures2d.SIFT_create()
    extract = cv2.xfeatures2d.SIFT_create()
  4. Whenever you see SIFT involved, you can be pretty sure some feature matching algorithm will be involved too. In our case, we'll create an instance for a FLANN matcher:
    flann_params = dict(algorithm = 1, trees = 5)flann = cv2.FlannBasedMatcher(flann_params, {})

    Note

    Note that currently, the enum values for FLANN are missing from the Python version of OpenCV 3, so, number 1, which is passed as the algorithm parameter, represents the FLANN_INDEX_KDTREE algorithm. I suspect the final version will be cv2.FLANN_INDEX_KDTREE, which is a little more helpful. Make sure to check the enum values for the correct flags.

  5. Next, we mention the BOW trainer:
    bow_kmeans_trainer = cv2.BOWKMeansTrainer(40)
  6. This BOW trainer utilizes 40 clusters. After this, we'll initialize the BOW extractor. This is the BOW class that will be fed a vocabulary of visual words and will try to detect them in the test image:
    extract_bow = cv2.BOWImgDescriptorExtractor(extract, flann)
  7. To extract the SIFT features from an image, we build a utility method, which takes the path to the image, reads it in grayscale, and returns the descriptor:
    def extract_sift(fn):
      im = cv2.imread(fn,0)
      return extract.compute(im, detect.detect(im))[1]

At this stage, we have everything we need to start training the BOW trainer.

  1. Let's read eight images per class (eight positives and eight negatives) from our dataset:
    for i in range(8):
      bow_kmeans_trainer.add(extract_sift(path(pos,i)))
      bow_kmeans_trainer.add(extract_sift(path(neg,i)))
  2. To create the vocabulary of visual words, we'll call the cluster() method on the trainer, which performs the k-means classification and returns the said vocabulary. We'll assign this vocabulary to BOWImgDescriptorExtractor so that it can extract descriptors from test images:
    vocabulary = bow_kmeans_trainer.cluster()
    extract_bow.setVocabulary(vocabulary)
  3. In line with other utility functions declared in this script, we'll declare a function that takes the path to an image and returns the descriptor as computed by the BOW descriptor extractor:
    def bow_features(fn):
      im = cv2.imread(fn,0)
      return extract_bow.compute(im, detect.detect(im))
  4. Let's create two arrays to accommodate the train data and labels, and populate them with the descriptors generated by BOWImgDescriptorExtractor, associating labels to the positive and negative images we're feeding (1 stands for a positive match, -1 for a negative):
    traindata, trainlabels = [],[]
    for i in range(20):
      traindata.extend(bow_features(path(pos, i))); trainlabels.append(1)
      traindata.extend(bow_features(path(neg, i))); trainlabels.append(-1)
  5. Now, let's create an instance of an SVM:
    svm = cv2.ml.SVM_create()
  6. Then, train it by wrapping the train data and labels into the NumPy arrays:
    svm.train(np.array(traindata), cv2.ml.ROW_SAMPLE, np.array(trainlabels))

We're all set with a trained SVM; all that is left to do is to feed the SVM a couple of sample images and see how it behaves.

  1. Let's first define another utility method to print the result of our predict method and return it:
    def predict(fn):
      f = bow_features(fn);  
      p = svm.predict(f)
      print fn, "	", p[1][0][0]
      return p
  2. Let's define two sample image paths and read them as the NumPy arrays:
    car, notcar = "/home/d3athmast3r/dev/python/study/images/car.jpg", "/home/d3athmast3r/dev/python/study/images/bb.jpg"
    car_img = cv2.imread(car)
    notcar_img = cv2.imread(notcar)
  3. We'll pass these images to the trained SVM, and get the result of the prediction:
    car_predict = predict(car)
    not_car_predict = predict(notcar)

    Naturally, we're hoping that the car image will be detected as a car (result of predict() should be 1.0), and that the other image will not (result should be -1.0), so we will only add text to the images if the result is the expected one.

  4. At last, we'll present the images on the screen, hoping to see the correct caption on each:
    font = cv2.FONT_HERSHEY_SIMPLEX
    
    if (car_predict[1][0][0] == 1.0):
      cv2.putText(car_img,'Car Detected',(10,30), font, 1,(0,255,0),2,cv2.LINE_AA)
    
    if (not_car_predict[1][0][0] == -1.0):
      cv2.putText(notcar_img,'Car Not Detected',(10,30), font, 1,(0,0, 255),2,cv2.LINE_AA)
    
    cv2.imshow('BOW + SVM Success', car_img)
    cv2.imshow('BOW + SVM Failure', notcar_img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

The preceding operation produces the following result:

What did we just do?

It also results in this:

What did we just do?

SVM and sliding windows

Having detected an object is an impressive achievement, but now we want to push this to the next level in these ways:

  • Detecting multiple objects of the same kind in an image
  • Determining the position of a detected object in an image

To accomplish this, we will use the sliding windows approach. If it's not already clear from the previous explanation of the concept of sliding windows, the rationale behind the adoption of this approach will become more apparent if we take a look at a diagram:

SVM and sliding windows

Observe the movement of the block:

  1. We take a region of the image, classify it, and then move a predefined step size to the right-hand side. When we reach the rightmost end of the image, we'll reset the x coordinate to 0 and move down a step, and repeat the entire process.
  2. At each step, we'll perform a classification with the SVM that was trained with BOW.
  3. Keep a track of all the blocks that have passed the SVM predict test.
  4. When you've finished classifying the entire image, scale the image down and repeat the entire sliding windows process.

Continue rescaling and classifying until you get to a minimum size.

This gives you the chance to detect objects in several regions of the image and at different scales.

At this stage, you will have collected important information about the content of the image; however, there's a problem: it's most likely that you will end up with a number of overlapping blocks that give you a positive score. This means that your image may contain one object that gets detected four or five times, and if you were to report the result of the detection, your report would be quite inaccurate, so here's where non-maximum suppression comes into play.

Example – car detection in a scene

We are now ready to apply all the concepts we learned so far to a real-life example, and create a car detector application that scans an image and draws rectangles around cars.

Let's summarize the process before diving into the code:

  1. Obtain a train dataset.
  2. Create a BOW trainer and create a visual vocabulary.
  3. Train an SVM with the vocabulary.
  4. Attempt detection using sliding windows on an image pyramid of a test image.
  5. Apply non-maximum suppression to overlapping boxes.
  6. Output the result.

Let's also take a look at the structure of the project, as it is a bit more complex than the classic standalone script approach we've adopted until now.

The project structure is as follows:

├── car_detector
│   ├── detector.py
│   ├── __init__.py
│   ├── non_maximum.py
│   ├── pyramid.py
│   └── sliding_w112661222.indow.py
└── car_sliding_windows.py

The main program is in car_sliding_windows.py, and all the utilities are contained in the car_detector folder. As we're using Python 2.7, we'll need an __init__.py file in the folder for it to be detected as a module.

The four files in the car_detector module are as follows:

  • The SVM training model
  • The non-maximum suppression function
  • The image pyramid
  • The sliding windows function

Let's examine them one by one, starting from the image pyramid:

import cv2

def resize(img, scaleFactor):
  return cv2.resize(img, (int(img.shape[1] * (1 / scaleFactor)), int(img.shape[0] * (1 / scaleFactor))), interpolation=cv2.INTER_AREA)

def pyramid(image, scale=1.5, minSize=(200, 80)):
  yield image

  while True:
    image = resize(image, scale)
    if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
      break

    yield image

This module contains two function definitions:

  • Resize takes an image and resizes it by a specified factor
  • Pyramid takes an image and returns a resized version of it until the minimum constraints of width and height are reached

Note

You will notice that the image is not returned with the return keyword but with the yield keyword. This is because this function is a so-called generator. If you are not familiar with generators, take a look at https://wiki.python.org/moin/Generators.

This will allow us to obtain a resized image to process in our main program.

Next up is the sliding windows function:

def sliding_window(image, stepSize, windowSize):
  for y in xrange(0, image.shape[0], stepSize):
    for x in xrange(0, image.shape[1], stepSize):
      yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

Again, this is a generator. Although a bit deep-nested, this mechanism is very simple: given an image, return a window that moves of an arbitrary sized step from the left margin towards the right, until the entire width of the image is covered, then goes back to the left margin but down a step, covering the width of the image repeatedly until the bottom right corner of the image is reached. You can visualize this as the same pattern used for writing on a piece of paper: start from the left margin and reach the right margin, then move onto the next line from the left margin.

The last utility is non-maximum suppression, which looks like this (Malisiewicz/Rosebrock's code):

def non_max_suppression_fast(boxes, overlapThresh):

  # if there are no boxes, return an empty list

  if len(boxes) == 0:

    return []



  # if the bounding boxes integers, convert them to floats --

  # this is important since we'll be doing a bunch of divisions

  if boxes.dtype.kind == "i":

    boxes = boxes.astype("float")



  # initialize the list of picked indexes 

  pick = []



  # grab the coordinates of the bounding boxes

  x1 = boxes[:,0]

  y1 = boxes[:,1]

  x2 = boxes[:,2]

  y2 = boxes[:,3]

  scores = boxes[:,4]

  # compute the area of the bounding boxes and sort the bounding

  # boxes by the score/probability of the bounding box

  area = (x2 - x1 + 1) * (y2 - y1 + 1)

  idxs = np.argsort(scores)[::-1]



  # keep looping while some indexes still remain in the indexes

  # list

  while len(idxs) > 0:

    # grab the last index in the indexes list and add the

    # index value to the list of picked indexes

    last = len(idxs) - 1

    i = idxs[last]

    pick.append(i)



    # find the largest (x, y) coordinates for the start of

    # the bounding box and the smallest (x, y) coordinates

    # for the end of the bounding box

    xx1 = np.maximum(x1[i], x1[idxs[:last]])

    yy1 = np.maximum(y1[i], y1[idxs[:last]])

    xx2 = np.minimum(x2[i], x2[idxs[:last]])

    yy2 = np.minimum(y2[i], y2[idxs[:last]])



    # compute the width and height of the bounding box

    w = np.maximum(0, xx2 - xx1 + 1)

    h = np.maximum(0, yy2 - yy1 + 1)



    # compute the ratio of overlap

    overlap = (w * h) / area[idxs[:last]]



    # delete all indexes from the index list that have

    idxs = np.delete(idxs, np.concatenate(([last],

      np.where(overlap > overlapThresh)[0])))



  # return only the bounding boxes that were picked using the

  # integer data type

  return boxes[pick].astype("int")

This function simply takes a list of rectangles and sorts them by their score. Starting from the box with the highest score, it eliminates all boxes that overlap beyond a certain threshold by calculating the area of intersection and determining whether it is greater than a certain threshold.

Examining detector.py

Now, let's examine the heart of this program, which is detector.py. This a bit long and complex; however, everything should appear much clearer given our newfound familiarity with the concepts of BOW, SVM, and feature detection/extraction.

Here's the code:

import cv2
import numpy as np

datapath = "/path/to/CarData/TrainImages/"
SAMPLES = 400

def path(cls,i):
    return "%s/%s%d.pgm"  % (datapath,cls,i+1)

def get_flann_matcher():
  flann_params = dict(algorithm = 1, trees = 5)
  return cv2.FlannBasedMatcher(flann_params, {})

def get_bow_extractor(extract, flann):
  return cv2.BOWImgDescriptorExtractor(extract, flann)

def get_extract_detect():
  return cv2.xfeatures2d.SIFT_create(), cv2.xfeatures2d.SIFT_create()

def extract_sift(fn, extractor, detector):
  im = cv2.imread(fn,0)
  return extractor.compute(im, detector.detect(im))[1]
    
def bow_features(img, extractor_bow, detector):
  return extractor_bow.compute(img, detector.detect(img))

def car_detector():
  pos, neg = "pos-", "neg-"
  detect, extract = get_extract_detect()
  matcher = get_flann_matcher()
  print "building BOWKMeansTrainer..."
  bow_kmeans_trainer = cv2.BOWKMeansTrainer(1000)
  extract_bow = cv2.BOWImgDescriptorExtractor(extract, flann)

  print "adding features to trainer"
  for i in range(SAMPLES):
    print i
    bow_kmeans_trainer.add(extract_sift(path(pos,i), extract, detect))
    bow_kmeans_trainer.add(extract_sift(path(neg,i), extract, detect))
    
  voc = bow_kmeans_trainer.cluster()
  extract_bow.setVocabulary( voc )

  traindata, trainlabels = [],[]
  print "adding to train data"
  for i in range(SAMPLES):
    print i
    traindata.extend(bow_features(cv2.imread(path(pos, i), 0), extract_bow, detect))
    trainlabels.append(1)
    traindata.extend(bow_features(cv2.imread(path(neg, i), 0), extract_bow, detect))
    trainlabels.append(-1)

  svm = cv2.ml.SVM_create()
  svm.setType(cv2.ml.SVM_C_SVC)
  svm.setGamma(0.5)
  svm.setC(30)
  svm.setKernel(cv2.ml.SVM_RBF)

  svm.train(np.array(traindata), cv2.ml.ROW_SAMPLE, np.array(trainlabels))
  return svm, extract_bow

Let's go through it. First, we'll import our usual modules, and then set a path for the training images.

Then, we'll define a number of utility functions:

def path(cls,i):
    return "%s/%s%d.pgm"  % (datapath,cls,i+1)

This function returns the path to an image given a base path and a class name. In our example, we're going to use the neg- and pos- class names, because this is what the training images are called (that is, neg-1.pgm). The last argument is an integer used to compose the final part of the image path.

Next, we'll define a utility function to obtain a FLANN matcher:

def get_flann_matcher():
  flann_params = dict(algorithm = 1, trees = 5)
  return cv2.FlannBasedMatcher(flann_params, {})

Again, it's not that the integer, 1, passed as an algorithm argument represents FLANN_INDEX_KDTREE.

The next two functions return the SIFT feature detectors/extractors and a BOW trainer:

def get_bow_extractor(extract, flann):
  return cv2.BOWImgDescriptorExtractor(extract, flann)

def get_extract_detect():
  return cv2.xfeatures2d.SIFT_create(), cv2.xfeatures2d.SIFT_create()

The next utility is a function used to return features from an image:

def extract_sift(fn, extractor, detector):
  im = cv2.imread(fn,0)
  return extractor.compute(im, detector.detect(im))[1]

Note

A SIFT detector detects features, while a SIFT extractor extracts and returns them.

We'll also define a similar utility function to extract the BOW features:

def bow_features(img, extractor_bow, detector):
  return extractor_bow.compute(img, detector.detect(img))

In the main car_detector function, we'll first create the necessary object used to perform feature detection and extraction:

  pos, neg = "pos-", "neg-"
  detect, extract = get_extract_detect()
  matcher = get_flann_matcher()
  bow_kmeans_trainer = cv2.BOWKMeansTrainer(1000)
  extract_bow = cv2.BOWImgDescriptorExtractor(extract, flann)

Then, we'll add features taken from training images to the trainer:

  print "adding features to trainer"
  for i in range(SAMPLES):
    print i
    bow_kmeans_trainer.add(extract_sift(path(pos,i), extract, detect))

For each class, we'll add a positive image to the trainer and a negative image.

After this, we'll instruct the trainer to cluster the data into k groups.

The clustered data is now our vocabulary of visual words, and we can set the BOWImgDescriptorExtractor class' vocabulary in this way:

vocabulary = bow_kmeans_trainer.cluster()
 extract_bow.setVocabulary(vocabulary)

Associating training data with classes

With a visual vocabulary ready, we can now associate train data with classes. In our case, we have two classes: -1 for negative results and 1 for positive ones.

Let's populate two arrays, traindata and trainlabels, containing extracted features and their corresponding labels. Iterating through the dataset, we can quickly set this up with the following code:

traindata, trainlabels = [], []  
  print "adding to train data"
  for i in range(SAMPLES):
    print i
    traindata.extend(bow_features(cv2.imread(path(pos, i), 0), extract_bow, detect))
    trainlabels.append(1)
    traindata.extend(bow_features(cv2.imread(path(neg, i), 0), extract_bow, detect))
    trainlabels.append(-1)

You will notice that at each cycle, we'll add one positive and one negative image, and then populate the labels with a 1 and a -1 value to keep the data synchronized with the labels.

Should you wish to train more classes, you could do that by following this pattern:

  traindata, trainlabels = [], []
  print "adding to train data"
  for i in range(SAMPLES):
    print i
    traindata.extend(bow_features(cv2.imread(path(class1, i), 0), extract_bow, detect))
    trainlabels.append(1)
    traindata.extend(bow_features(cv2.imread(path(class2, i), 0), extract_bow, detect))
    trainlabels.append(2)
    traindata.extend(bow_features(cv2.imread(path(class3, i), 0), extract_bow, detect))
    trainlabels.append(3)

For example, you could train a detector to detect cars and people and perform detection on these in an image containing both cars and people.

Lastly, we'll train the SVM with the following code:

  svm = cv2.ml.SVM_create()
  svm.setType(cv2.ml.SVM_C_SVC)
  svm.setGamma(0.5)
  svm.setC(30)
  svm.setKernel(cv2.ml.SVM_RBF)

  svm.train(np.array(traindata), cv2.ml.ROW_SAMPLE, np.array(trainlabels))
  return svm, extract_bow

There are two parameters in particular that I'd like to focus your attention on:

  • C: With this parameter, you could conceptualize the strictness or severity of the classifier. The higher the value, the less chances of misclassification, but the trade-off is that some positive results may not be detected. On the other hand, a low value may over-fit, so you risk getting false positives.
  • Kernel: This parameter determines the nature of the classifier: SVM_LINEAR indicates a linear hyperplane, which, in practical terms, works very well for a binary classification (the test sample either belongs to a class or it doesn't), while SVM_RBF (radial basis function) separates data using the Gaussian functions, which means that the data is split into several kernels defined by these functions. When training the SVM to classify for more than two classes, you will have to use RBF.

Finally, we'll pass the traindata and trainlabels arrays into the SVM train method, and return the SVM and BOW extractor object. This is because in our applications, we don't want to have to recreate the vocabulary every time, so we expose it for reuse.

Dude, where's my car?

We are ready to test our car detector!

Let's first create a simple program that loads an image, and then operates detection using the sliding windows and image pyramid techniques, respectively:

import cv2
import numpy as np
from car_detector.detector import car_detector, bow_features
from car_detector.pyramid import pyramid
from car_detector.non_maximum import non_max_suppression_fast as nms
from car_detector.sliding_window import sliding_window

def in_range(number, test, thresh=0.2):
  return abs(number - test) < thresh

test_image = "/path/to/cars.jpg"

svm, extractor = car_detector()
detect = cv2.xfeatures2d.SIFT_create()

w, h = 100, 40
img = cv2.imread(test_img)

rectangles = []
counter = 1
scaleFactor = 1.25
scale = 1
font = cv2.FONT_HERSHEY_PLAIN

for resized in pyramid(img, scaleFactor):  
  scale = float(img.shape[1]) / float(resized.shape[1])
  for (x, y, roi) in sliding_window(resized, 20, (w, h)):
    
    if roi.shape[1] != w or roi.shape[0] != h:
      continue

    try:
      bf = bow_features(roi, extractor, detect)
      _, result = svm.predict(bf)
      a, res = svm.predict(bf, flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)
      print "Class: %d, Score: %f" % (result[0][0], res[0][0])
      score = res[0][0]
      if result[0][0] == 1:
        if score < -1.0:
          rx, ry, rx2, ry2 = int(x * scale), int(y * scale), int((x+w) * scale), int((y+h) * scale)
          rectangles.append([rx, ry, rx2, ry2, abs(score)])
    except:
      pass

    counter += 1

windows = np.array(rectangles)
boxes = nms(windows, 0.25)


for (x, y, x2, y2, score) in boxes:
  print x, y, x2, y2, score
  cv2.rectangle(img, (int(x),int(y)),(int(x2), int(y2)),(0, 255, 0), 1)
  cv2.putText(img, "%f" % score, (int(x),int(y)), font, 1, (0, 255, 0))

cv2.imshow("img", img)
cv2.waitKey(0)

The notable part of the program is the function within the pyramid/sliding window loop:

      bf = bow_features(roi, extractor, detect)
      _, result = svm.predict(bf)
      a, res = svm.predict(bf, flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)
      print "Class: %d, Score: %f" % (result[0][0], res[0][0])
      score = res[0][0]
      if result[0][0] == 1:
        if score < -1.0:
          rx, ry, rx2, ry2 = int(x * scale), int(y * scale), int((x+w) * scale), int((y+h) * scale)
          rectangles.append([rx, ry, rx2, ry2, abs(score)])

Here, we extract the features of the region of interest (ROI), which corresponds to the current sliding window, and then we call predict on the extracted features. The predict method has an optional parameter, flags, which returns the score of the prediction (contained at the [0][0] value).

Note

A word on the score of the prediction: the lower the value, the higher the confidence that the classified element really belongs to the class.

So, we'll set an arbitrary threshold of -1.0 for classified windows, and all windows with less than -1.0 are going to be taken as good results. As you experiment with your SVMs, you may tweak this to your liking until you find a golden mean that assures best results.

Finally, we add the computed coordinates of the sliding window (meaning, we multiply the current coordinates by the scale of the current layer in the image pyramid so that it gets correctly represented in the final drawing) to the array of rectangles.

There's one last operation we need to perform before drawing our final result: non-maximum suppression.

We turn the rectangles array into a NumPy array (to allow certain kind of operations that are only possible with NumPy), and then apply NMS:

windows = np.array(rectangles)
boxes = nms(windows, 0.25)

Finally, we proceed with displaying all our results; for the sake of convenience, I've also printed the score obtained for all the remaining windows:

Dude, where's my car?

This is a remarkably accurate result!

A final note on SVM: you don't need to train a detector every time you want to use it, which would be extremely impractical. You can use the following code:

svm.save('/path/to/serialized/svmxml')

You can subsequently reload it with a load method and feed it test images or frames.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.7