Evaluating algorithms

There are many dimensions upon which we can evaluate the algorithms. This section explores how to evaluate algorithms.

Assuming we want to have fast face detection—which algorithm would be better?

The only way to understand the performance of an algorithm is to measure it. Thankfully Go comes with benchmarking built in. That is what we are about to do.

To build benchmarks we must be very careful about what we're benchmarking. In this case, we want to benchmark the performance of the detection algorithm. This means comparing classifier.DetectMultiScale versus, pigoClass.RunCascade and pigoClass.ClusterDetections.

Also, we have to compare apples to apples—it would be unfair if we compare one algorithm with a 3840 x 2160 image and the other algorithm with a 640 x 480 image. There are simply more pixels in the former compared to the latter:

func BenchmarkGoCV(b *testing.B) {
  img := gocv.IMRead("test.png", gocv.IMReadUnchanged)
  if img.Cols() == 0 || img.Rows() == 0 {
    b.Fatalf("Unable to read image into file")
  }

  classifier := gocv.NewCascadeClassifier()
  if !classifier.Load(haarCascadeFile) {
    b.Fatalf("Error reading cascade file: %v
", haarCascadeFile)
  }

  var rects []image.Rectangle
  b.ResetTimer()

  for i := 0; i &lt; b.N; i++ {
    rects = classifier.DetectMultiScale(img)
  }
  _ = rects
}

There are a few things to note—the set up is made early on in the function. Then b.ResetTimer() is called. This resets the timer so that setups are not counted towards the benchmark. The second thing to note is that the classifier is set to detect faces on the same image over and over again. This is so that we can get an accurate idea of how well the algorithm performs. The last thing to note is the rather weird _ = rects line at the end. This is done to prevent Go from optimizing away the calls. Technically, it is not needed, as I am quite certain that the DetectMultiScale function is complicated enough as to never have been optimized away, but that line is just there for insurance.

A similar set up can be done for PIGO:

func BenchmarkPIGO(b *testing.B) {
  img := gocv.IMRead("test.png", gocv.IMReadUnchanged)
  if img.Cols() == 0 || img.Rows() == 0 {
    b.Fatalf("Unable to read image into file")
  }
  width := img.Cols()
  height := img.Rows()
  goImg, grayGoImg, pigoClass, cParams, imgParams := pigoSetup(width, 
                                                     height)

  var dets []pigo.Detection
  b.ResetTimer()

  for i := 0; i &lt; b.N; i++ {
    grayGoImg = naughtyGrayscale(grayGoImg, goImg)
    imgParams.Pixels = grayGoImg
    dets = pigoClass.RunCascade(imgParams, cParams)
    dets = pigoClass.ClusterDetections(dets, 0.3)
  }
  _ = dets
}

This time the set up is more involved than the GoCV benchmark. It may seem that these two functions are benchmarking different things—the GoCV benchmark takes a gocv.Mat while the PIGO benchmark takes a []uint8. But remember that we're interested in the performance of the algorithms on an image.

The main reason why the gray scaling is also added into the benchmark is because, although GoCV takes a color image, the actual Viola-Jones method uses a gray scale image. Internally, OpenCV converts the image into a gray scale before detection. Because we're unable to separate the detection part by itself, the only alternative is to consider conversion to gray scale as part of the detection process.

To run the benchmark, both functions are added into algorithms_test.go. Then go test -run=^$ -bench=. -benchmem is run. The result is as follows:

goos: darwin
goarch: amd64
pkg: chapter9
BenchmarkGoCV-4 20 66794328 ns/op 32 B/op 1 allocs/op
BenchmarkPIGO-4 30 47739076 ns/op 0 B/op 0 allocs/op
PASS
ok chapter9 3.093s

Here we can see that GoCV is about 1/3 slower than PIGO. A key reason for this is due to the cgo calls made in order to interface with OpenCV. However, it should also be noted that the PICO algorithm is faster than the original Viola-Jones algorithm. That PIGO can exceed the performance of a highly tuned and optimized Viola-Jones algorithm found in OpenCV, is rather impressive.

However, speed is not the only thing that matters. There are other dimensions that matter. The following are things that matter when considering face detection algorithms. Tests for them are suggested but left as an exercise for the reader:

| Consideration | Test |
|:---:          |:---:|
| Performance in detecting many faces | Benchmark with image of crowd |
| Correctness in detecting many faces | Test with image of crowd, with  
                                        known numbers |
| No racial discrimination | Test with images of multi-ethnic peoples  
                             with different facial features |

The last one is of particular interest. For many years, ML algorithms have not served people of color well. I myself had some issues when using a Viola-Jones model (a different model from the one in the repository) to detect eyes. In a facial feature detection project I did about five years ago, I was trying to detect eyes on a face.

The so-called Asian eyes are composed of two major features—an upward slant away from the nose to the outside of the face; and eyes that have epicanthic folds, giving the illusion of a single eyelid—that is, an eyelid without crease. The model I was working on couldn't detect where my eyes were on occasion because the filter looked for the crease of the eyelid, and the creases on my eyelids are not that obvious.

On that front, some algorithms and models may appear accidentally exclusionary. To be clear, I am NOT saying that the creators of such algorithms and models are racist. However there are some assumptions that were made in the design of the algorithms that did not include considerations of all the possible cases—nor could they ever. For example, any contrast-based detection of facial landmarks will fare poorly with people who have darker skin tones. On the flipside, contrast-based detection systems are usually very fast, because there is a minimal amount of calculation required. Here, there is a tradeoff to be made—do you need to detect everyone, or do you need to be fast?

This chapter aims to encourage readers to think more about use cases of machine learning algorithms and the tradeoffs required in using the algorithms. This book has mostly been about thinking about the tradeoffs. I highly encourage the reader to think deeply about the use cases of the machine learning algorithms. Understand all the tradeoffs required. Once the appropriate tradeoffs are understood, implementation is usually a piece of cake.

Table of Contents for Evaluating algorithms

Create new playlist

Sign In

Sign Up

Table of Contents for
Evaluating algorithms