© Nicolas Modrzyk 2020
N. ModrzykReal-Time IoT Imaging with Deep Neural Networkshttps://doi.org/10.1007/978-1-4842-5722-7_4

4. Analyzing Video Streams on the Raspberry Pi

Nicolas Modrzyk1 
(1)
Tokyo, Tokyo, Japan
 

In this chapter, you’ll learn how to analyze video streams using concepts taken from functional programming. Specifically, you’ll use the Filter interface and combine it with a Pipeline object and then apply them to the video stream.

We will start with an overview of filters. Then we’ll look at different basic, fun filters, and we’ll gradually move on to object detection using different vision techniques. Finally, we’ll talk about neural networks.

Overview of Applying Filters

In the Clojure language, you can apply a set of transformations directly to the Mat object of a video stream without using any extra boilerplate code. I seriously recommend taking a look at even the most basic of the origami examples, available in the README:
https://github.com/hellonico/origami/blob/master/README.md#support-for-opencv-412-is-in
Listing 4-1 shows how to load a picture, change it to gray, and apply a canny function all in one pipeline. This might even make you want to try the Clojure version of OpenCV.
(require
  '[opencv4.utils :as u]
  '[opencv4.core :refer :all])
(->
  (imread "doc/cat_in_bowl.jpeg")
  (cvt-color! COLOR_RGB2GRAY)
  (canny! 300.0 100.0 3 true)
  (bitwise-not!)
  (u/resize-by 0.5)
  (imwrite "doc/canny-cat.jpg"))
Listing 4-1

Read, Turn to Gray, Canny Resize, and Save

This is a Java book, though, so let’s see how we can apply the same concepts in Java.

Here we introduce the concept of a pipeline of filters, where each filter performs one operation on the Mat object.

Here are a few examples of what filters can do:
  • Turning a Mat object to gray

  • Applying a Canny effect

  • Looking for edges

  • Pencil sketching

  • Instagram filters, like sepia or vintage

  • Doing background subtraction

  • Detecting cat or people faces using Haar objection detection or color detection

  • Running a neural network and identifying objects

The following are the two Java types that we’ll introduce to implement these concepts:
  • The Filter interface, which consists of only one function, apply(Mat in), and returns a Mat object, just like in functional programming.

  • The Pipeline class, itself a Filter, that takes a list of classes or already instantiated filters. When apply is called, it applies the classes one by one.

Listing 4-2 shows the (simple) Filter interface.
import org.opencv.core.Mat;
public interface Filter {
    public Mat apply(Mat in);
}
Listing 4-2

The Filter Interface

Listing 4-3 shows an example of implementing the Pipeline class, where you simply combine the different filters by calling them one by one.
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
import org.opencv.core.Mat;
public class Pipeline implements Filter {
    List<Filter> filters;
    public Pipeline(Class... __filters) {
        List<Class<Filter>> _filters = (List) Arrays.asList(__filters);
        this.filters = _filters.stream().map(i -> {
            try {
                return (Filter) Class.forName(i.getName()).newInstance();
            } catch (Exception e) {
                return null;
            }
        }).collect(Collectors.toList());
    }
    public Pipeline(Filter... __filters) {
        this.filters = (List) Arrays.asList(__filters);
    }
    @Override
    public Mat apply(Mat in) {
        Mat dst = in.clone();
        for (Filter f : filters) {
            dst = f.apply(dst);
        }
        return dst;
    }
}
Listing 4-3

Many Filters

Note here the usage of the deprecated version of the newInstance function directly in the class. This may not be calling the constructor you really want, but it works well enough for the examples in this book.

So far, Filter and Pipeline haven’t done a lot, of course, so let’s review some basic examples in the upcoming section.

Applying Basic Filters

In this section, we’ll look at several examples of basic filters.

Gray Filter

The most obvious use of a filter is to turn a Mat object from color to gray. In OpenCV, this is done using the function cvtColor from the Imgproc class.

Omitting the package definition, we combine the webcam code from a few pages ago with the standard cvtColor wrapper in a class that implements the recently introduced Filter interface (Listing 4-4).
import org.opencv.core.Mat;
import org.opencv.imgproc.Imgproc;
import org.opencv.videoio.VideoCapture;
import origami.ImShow;
import origami.Origami;
Listing 4-4

Gray Filter on the Stream

public class WebcamWithFilters {
    public static void main(final String[] args) {
        Origami.init();
        final VideoCapture cap = new VideoCapture(0);
        final ImShow ims = new ImShow("Camera", 800, 600);
        final Mat buffer = new Mat();
        Filter gray = new Gray();
        while (cap.read(buffer)) {
            ims.showImage(gray.apply(buffer));
        }
        cap.release();
    }
}
class Gray implements Filter {
    public Mat apply(final Mat img) {
        final Mat mat1 = new Mat();
        Imgproc.cvtColor(img, mat1, Imgproc.COLOR_RGB2GRAY);
        return mat1;
    }
}
You can run this directly on the Pi, and if you stand in front of your webcam, you will get something like I did in Figure 4-1.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig1_HTML.jpg
Figure 4-1

Not only my hair, but the whole picture, is turning gray

Edge Preserving Filter

In the same fashion, we can implement a wrapper around the EdgePreserving function of OpenCV’s Photo class. This is used in many different applications to smooth and remove unwanted lines in pictures. For example, Listing 4-5 is really just a basic call of the function edgePreservingFilter.
import org.opencv.photo.Photo;
class EdgePreserving implements Filter {
    public int flags = Photo.RECURS_FILTER;
    // int flags = NORMCONV_FILTER;
    public float sigma_s = 60;
    public float sigma_r = 0.4f;
    public Mat apply(Mat in) {
        Mat dst = new Mat();
        Photo.edgePreservingFilter(in, dst, flags, sigma_s, sigma_r);
        return dst;
    }
}
Listing 4-5

EdgePreservering Class Re-implemented as a Filter

You can use this new filter by modifying the main method of WebcamWithFilters, as shown in Listing 4-6.
Filter filter = new EdgePreserving();
while (cap.read(buffer)) {
      ims.showImage(filter.apply(buffer));
}
Listing 4-6

Modified Main Function to Use Edge Preserving Filter

Now let’s move on to executing the new code. Again, if you’re facing your webcam, you should get something similar to what I look like in Figure 4-2.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig2_HTML.jpg
Figure 4-2

Edge Preserving filter

Canny

Another useful filter in the OpenCV world is to apply a Canny effect , which is a fast and effective way to find contours and shapes in a Mat object. A quick implementation of Canny as a filter is shown in Listing 4-7.
class Canny implements Filter {
    public boolean inverted = true;
    public int threshold1 = 100;
    public int threshold2 = 200;
    @Override
    public Mat apply(Mat in) {
        Mat dst = new Mat();
        Imgproc.Canny(in, dst, threshold1, threshold2);
        if (inverted) {
            Core.bitwise_not(dst, dst, new Mat());
        }
        Imgproc.cvtColor(dst, dst, Imgproc.COLOR_GRAY2RGB);
        return dst;
    }
}
Listing 4-7

OpenCV’s Canny in a Filter

Figure 4-3 shows the result of applying the Canny filter to the main loop.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig3_HTML.jpg
Figure 4-3

Canny filter on webcam stream

Debugging (Again)

Here we go again with some notes on debugging. If you add a breakpoint in the main capture loop of WebcamWithFilters, you’ll get access to all the different fields of the filter. As shown in Figure 4-4, let’s change the value of the inverted Boolean value to false.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig4_HTML.jpg
Figure 4-4

Updating the filter parameters in real time

Then, let’s remove the breakpoint and restart the code execution normally. Figure 4-5 shows how changing the value of the filter straightaway changed the running code and the color of the Mat object displaying on the screen.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig5_HTML.jpg
Figure 4-5

Noninverted Canny filter

When implementing your own filters, it’s a good idea to keep the most influential variables as fields of the class so you’re not flooded with values but still have access to the important ones.

Combining Filters

You probably realized in the previous sections that you will want to find out the performance of each filter.

Performance actually results from how many frames per seconds you can handle when reading from the video file or directly from the webcam device.

Here we’re going to do the following:
  • Display the frame rate directly on the image

  • Combine a Gray filter with the Framerate filter using the Pipeline class we defined earlier in this chapter

We already have the code for the Gray filter, so we’ll move directly to the code for displaying the frame rate per second (FPS).

I always thought I could access the value of the frame rate using OpenCV’s set of properties on VideoCapture. Unfortunately, we’re pretty much always stuck with a hard-coded value and not what is actually showing on the screen.

So, the implementation of FPS in Listing 4-8 is a small work-around that does some simple arithmetic based on how many frames have been displayed since the beginning of the filter’s life.

Finally, it uses putText to apply the text directly onto the frame. It works well enough for simple use cases.
class FPS implements Filter {
    long start = System.currentTimeMillis();
    int count = 0;
    Point org = new Point(50, 50);
    int fontFace = Imgproc.FONT_HERSHEY_PLAIN;
    double fontScale = 4.0;
    Scalar color = new Scalar(0, 0, 0);
    int thickness = 3;
    public Mat apply(Mat in) {
        count++;
        String text = "FPS: " + count / (1 + ((System.currentTimeMillis() - start) / 1000));
        Imgproc.putText(in, text, org, fontFace, fontScale, color, thickness);
        return in;
    }
}
Listing 4-8

FPS Filter

Now let’s get back to combining the FPS showing on the stream with the Gray filter.

You’ll be happy to know that the only line you have to update in the main() function of WebcamWithFilters is the one where the filter is instantiated, as shown here:
Filter filter = new Pipeline(Gray.class, FPS.class);
When you run the sample again from your Raspberry Pi, you’ll get something that looks like what’s shown in Figure 4-6. Applying the two filters together, I usually get around 15 frames per second on the Raspberry Pi.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig6_HTML.jpg
Figure 4-6

Combined Gray filter with FPS

Applying Instagram-like Filters

Enough serious work for now. Let’s take a short break and have a bit of fun with some Instagram-like filters.

Color Map

Let’s start this fun section by using OpenCV’s colormap function f rom ImgProc. We move the parameter of the colormap to the constructor, as shown in Listing 4-9, so that we can update it via the debug screen.
class Color implements Filter {
    int colormap = 0;
    public Color(int colormap) {
        this.colormap = colormap;
    }
    public Color() {
        this.colormap = Imgproc.COLORMAP_INFERNO;
    }
    public Mat apply(Mat img) {
        Mat threshed = new Mat();
        Imgproc.applyColorMap(img, threshed, colormap);
        return threshed;
    }
}
Listing 4-9

Color Map

To instantiate the filter, we need to pass in the color map we want to use, so this is done directly in the constructor. Here we need to have the instantiated filter instead of just its class, so we use the second constructor of the Pipeline class, with the instantiated Filter objects passed to the constructor.
Filter filter = new Pipeline(new Color(Imgproc.COLORMAP_INFERNO), new FPS());
When you execute this, you get something like Figure 4-7.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig7_HTML.jpg
Figure 4-7

Inferno

Thresh

Thresh is another fun filter made by applying the threshold function of Imgproc. It applies a fixed-level threshold to each array element of the Mat object.

The original purpose of the Thresh filter was to segment elements of a picture, such as to remove the noise of a picture by removing unwanted elements. It is not usually used for Instagram filtering, but it does look nice and can give you some creative ideas.

Listing 4-10 shows how the Thresh filter can be implemented.
class Thresh implements Filter{
    int sensitivity = 100;
    int maxVal = 255;
    public Thresh() {
    }
    public Thresh(int _sensitivity) {
        this.sensitivity = _sensitivity;
    }
    public Mat apply(Mat img) {
        Mat threshed = new Mat();
        Imgproc.threshold(img, threshed, sensitivity, maxVal, Imgproc.THRESH_BINARY);
        return threshed;
    }
}
Listing 4-10

Applying Threshold

Figure 4-8 shows the result.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig8_HTML.jpg
Figure 4-8

Applying Thresh for a burning effect

Sepia

Let’s use the old (pun intended) Sepia effect again here, as implemented in Listing 4-11.
class Sepia implements Filter {
    public Mat apply(Mat source) {
        Mat kernel = new Mat(3, 3, CvType.CV_32F);
        kernel.put(0, 0,
                0.272, 0.534, 0.131,
                0.349, 0.686, 0.168,
                0.393, 0.769, 0.189);
        Mat destination = new Mat();
        Core.transform(source, destination, kernel);
        return destination;
    }
}
Listing 4-11

Sepia

When used with the video stream, the Sepia effect gives you output that looks like Figure 4-9.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig9_HTML.jpg
Figure 4-9

Sepia effect

Cartoon

This naïve implementation of a Cartoon effect takes the important feature-defining lines of the base picture and, after applying a smoothing effect and a blur effect, applies a threshold on each pixel value. Then it combines the result of those operations, as shown in Listing 4-12.
class Cartoon implements Filter {
    public int d = 17;
    public int sigmaColor = d;
    public int sigmaSpace = 7;
    public int ksize = 7;
    public double maxValue = 255;
    public int blockSize = 19;
    public int C = 2;
    public Mat apply(Mat inputFrame) {
        Mat gray = new Mat();
        Mat co = new Mat();
        Mat m = new Mat();
        Mat mOutputFrame = new Mat();
        Imgproc.cvtColor(inputFrame, gray, Imgproc.COLOR_BGR2GRAY);
        Imgproc.bilateralFilter(gray, co, d, sigmaColor, sigmaSpace);
        Mat blurred = new Mat();
        Imgproc.blur(co, blurred, new Size(ksize, ksize));
        Imgproc.adaptiveThreshold(blurred, blurred, maxValue, Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY,
                blockSize, C);
        Imgproc.cvtColor(blurred, m, Imgproc.COLOR_GRAY2BGR);
        Core.bitwise_and(inputFrame, m, mOutputFrame);
        return mOutputFrame;
    }
}
Listing 4-12

Cartoon Filter

Applying the Cartoon filter on the video stream gives you an effect like in Figure 4-10.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig10_HTML.jpg
Figure 4-10

Cartoon effect

Pencil Effect

I love the Pencil effect , obtained by calling the pencilSketch method from the core OpenCV Photo class. Unfortunately, it is much too slow to apply in real time on the Raspberry Pi. It does give some pretty results with hardly any implementation effort, though. See for yourself in Listing 4-13.
class PencilSketch implements Filter {
    float sigma_s = 60;
    float sigma_r = 0.07f;
    float shade_factor = 0.05f;
    boolean gray = false;
    @Override
    public Mat apply(Mat in) {
        Mat dst = new Mat();
        Mat dst2 = new Mat();
        pencilSketch(in, dst, dst2, sigma_s, sigma_r, shade_factor);
        return gray ? dst : dst2;
    }
}
Listing 4-13

Pencil Effect

When this effect is applied, it gives you a result like the one shown in Figure 4-11.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig11_HTML.jpg
Figure 4-11

Pencil sketching

Woo-hoo. That was quite a few effects ready to use and apply for leisure. There are a few others available in the origami repositories, and you can of course contribute your own, but for now, let’s move on to serious object detection.

Performing Object Detection

Object detection is the concept of finding an object within a picture using different programming algorithms. It is a task that has long been done by human beings and that was quite hard for brain-less computers to do. But that has changed recently with advances in technology.

In this section of this chapter, we’ll review different computer vision techniques to identify objects within an image, without any information about its content. Specifically, we’ll review the following:
  • Using a simple contours-drawing filter

  • Detecting objects by colors

  • Using Haar classifiers

  • Using template matching

  • Using a neural network like Yolo

The examples go in somewhat progressive order of difficulty, so it’s best to try them in the order of this list.

Removing the Background

Removing the background is a technique you can use to remove unnecessary artifacts from a scene. The objects you are trying to find are probably not static and are likely to be moving across the set of pictures or video streams. To remove artefacts efficiently the algorithm needs to be able to differentiate two Mat objects and use some kind of short-term memory to differentiate moving things (in the foreground) from standard scene objects in the background.

In OpenCV, two easy-to-use BackgroundSubtractor classes are available for you to use. Their introduction and full explanation can be found on the following web site:
https://docs.opencv.org/master/d1/dc5/tutorial_background_subtraction.html

Basically, you give more and more frames to the background subtractor, which can then detect what is moving in the foreground and what is not.

Listing 4-14 is pretty easy to follow; just be careful not to mistake the apply function from the subtractor class with the one we have in the Filter interface.
class BackgroundSubtractor implements Filter {
    boolean useMOG2 = true;
    BackgroundSubtractor backSub;
    double learningRate = 1.0;
    boolean showMask = true;
    public BackgroundSubtractor() {
        if (useMOG2) {
            backSub = Video.createBackgroundSubtractorMOG2();
        } else {
            backSub = Video.createBackgroundSubtractorKNN();
        }
    }
    @Override
    public Mat apply(Mat in) {
        Mat mask = new Mat();
        backSub.apply(in, mask);
        Mat result = new Mat();
        if (showMask) {
            Imgproc.cvtColor(mask, result, Imgproc.COLOR_GRAY2RGB);
            return result;
        } else {
            in.copyTo(result, mask);
            return result;
        }
    }
}
Listing 4-14

BackgroundSubtractor Class

The filter is loaded by calling the constructor, and you should get a result similar to Figure 4-12.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig12_HTML.jpg
Figure 4-12

Removing the background

Use KNN Background Subtractor

Once you have this filter running, try switching to the KNN-based BackgroundSubtractor and see the difference in speed (looking at the frame rate) and the accuracy of the results.

Detecting by Contours

The second most basic OpenCV feature is to be able to find the contours in an image. The Contours filter uses the findContours function from the Imgproc class.

findContours usually brings better results when you first do the following:
  • Turn the input Mat object to gray

  • Apply a Canny filter

Those two steps are added in Listing 4-15; then we draw the contours on a black Mat object, created with the zeros function.
class Contours implements Filter {
    private int threshold = 100;
    public Mat apply(Mat srcImage) {
        Mat cannyOutput = new Mat();
        Mat srcGray = new Mat();
        Imgproc.cvtColor(srcImage, srcGray, Imgproc.COLOR_BGR2GRAY);
        Imgproc.Canny(srcGray, cannyOutput, threshold, threshold * 2);
        List<MatOfPoint> contours = new ArrayList<>();
        Mat hierarchy = new Mat();
        Imgproc.findContours(cannyOutput, contours, hierarchy, Imgproc.RETR_TREE, Imgproc.CHAIN_APPROX_SIMPLE);
        Mat drawing = Mat.zeros(cannyOutput.size(), CvType.CV_8UC3);
        for (int i = 0; i < contours.size(); i++) {
            Scalar color = new Scalar(256, 150, 0);
            Imgproc.drawContours(drawing, contours, i, color, 2, 8, hierarchy, 0, new Point());
        }
        return drawing;
    }
}
Listing 4-15

Detecting Contours

Use a Pipeline For Contours
The careful reader will notice that since the previous code is using the Pipeline class, it would actually be nicer if it were written like this:
Pipeline(new Canny(), new Gray(), new Contours())

with a Contours filter that just extracts contours. Try it!

Applying the Contours filter on a video of Marcel gives an artistic look (Figure 4-13).
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig13_HTML.jpg
Figure 4-13

Using OpenCV’s detecting contours feature

Remove The First Canny Filter and Compare

Here’s an exercise for you: try removing the two steps of converting to gray and applying the Canny filter and then compare the results to the original.

Detecting by Color

A picture, or a Mat object, in OpenCV is usually in a red/green/blue (RGB) color space (actually blue/green/red in OpenCV to be accurate). This is easy to understand if you think of it that each pixel is assigned a value for each channel. To see the possible values for those channels, you can review the following site:
https://www.rapidtables.com/web/color/RGB_Color.html

The problem with this color space is that the amount of luminosity and contrast are mingled with the amount of color itself.

When looking for specific colors in Mat objects, we switch to a color space named HSV (for hue, saturation, value). In this color space, the color directly translates to the Hue value.

The values for Hue are usually between 0 and 360, like the number of degrees in a cylinder. OpenCV has a slightly different scheme, with a range divided by 2 (so it takes less space in memory). Table 4-1 lists the Hue value ranges.
Table 4-1

Hue Values in OpenCV

Color

Hue Range

Red

0 to 30 and 150 to 180

Green

30 to 90

Blue

90 to 150

Listing 4-16 converts the color space and checks for Hue values in the range of the wanted color using inRange, with some added magic to draw the shapes properly using findContours again at the end of the example.
class ColorDetector implements Filter {
    Scalar minColor, maxColor;
    public ColorDetector(Scalar minColor, Scalar maxColor) {
        this.minColor = minColor;
        this.maxColor = maxColor;
    }
    @Override
    public Mat apply(Mat input) {
        Mat array255 = new Mat(input.height(), input.width(), CvType.CV_8UC1);
        array255.setTo(new Scalar(255));
        Mat distance = new Mat(input.height(), input.width(), CvType.CV_8UC1);
        List<Mat> lhsv = new ArrayList<Mat>(3);
        Mat circles = new Mat();
        Mat hsv_image = new Mat();
        Mat thresholded = new Mat();
        Mat thresholded2 = new Mat();
        Imgproc.cvtColor(input, hsv_image, Imgproc.COLOR_BGR2HSV);
        Core.inRange(hsv_image, minColor, maxColor, thresholded);
        Imgproc.erode(thresholded, thresholded, Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(8, 8)));
        Imgproc.dilate(thresholded, thresholded, Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(8, 8)));
        Core.split(hsv_image, lhsv);
        Mat S = lhsv.get(1);
        Mat V = lhsv.get(2);
        Core.subtract(array255, S, S);
        Core.subtract(array255, V, V);
        S.convertTo(S, CvType.CV_32F);
        V.convertTo(V, CvType.CV_32F);
        Core.magnitude(S, V, distance);
        Core.inRange(distance, new Scalar(0.0), new Scalar(200.0), thresholded2);
        Core.bitwise_and(thresholded, thresholded2, thresholded);
        Imgproc.GaussianBlur(thresholded, thresholded, new Size(9, 9), 0, 0);
        List<MatOfPoint> contours = new ArrayList<MatOfPoint>();
        Imgproc.HoughCircles(thresholded, circles, Imgproc.CV_HOUGH_GRADIENT, 2, thresholded.height() / 8, 200, 100, 0, 0);
        Imgproc.findContours(thresholded, contours, thresholded2, Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_SIMPLE);
        Imgproc.drawContours(input, contours, -2, new Scalar(10, 0, 0), 4);
        return input;
    }
}
class RedDetector extends ColorDetector {
    public RedDetector() {
        super(new Scalar(0, 100, 100), new Scalar(10, 255, 255));
    }
}
Listing 4-16

Detecting Red

The result of applying this filter to a video of roses looks something like Figure 4-14.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig14_HTML.jpg
Figure 4-14

Detecting red roses

Implement a Detect Blue Filter

Looking at the values for Hue in Table 4-1, you can see it would not be so difficult to implement a filter that searches for blue colors. This is left as an exercise for you.

Detecting by Haar

As you saw in Chapter 1, you can use a Haar-based classifier to identify objects and/or people in a Mat object. The code is pretty much the same as you have seen, with an added emphasis on the number and size of shapes we are looking for.

Specifically, the following code shows how two sizes are used as parameters to specify the minimum size and the maximum size of the objects we are looking for.
classifier.detectMultiScale(input, faces, 1.1, 2, -1, new Size(100, 100), new Size(500, 500));
So, Listing 4-17, with an extra main example function, shows how to use different XML files as parameters to the Haar classifier detection.
public class DetectWithHaar {
    public static void main(String[] args) {
        Origami.init();
        VideoCapture cap = new VideoCapture(0);
        Mat buffer = new Mat();
        ImShow ims = new ImShow("Camera", 800, 600);
        Filter filter = new Pipeline(new Haar("haarcascades/haarcascade_frontalface_default.xml"), new FPS());
        while (cap.grab()) {
            cap.retrieve(buffer);
            ims.showImage(filter.apply(buffer));
        }
        cap.release();
    }
}
class Haar implements Filter {
    private CascadeClassifier classifier;
    Scalar white = new Scalar(255, 255, 255);
    public Haar(String path) {
        classifier = new CascadeClassifier(path);
    }
    public Mat apply(Mat input) {
        MatOfRect faces = new MatOfRect();
        classifier.detectMultiScale(input, faces, 1.1, 2, -1, new Size(100, 100), new Size(500, 500));
        for (Rect rect : faces.toArray()) {
            Imgproc.putText(input, "Face", new Point(rect.x, rect.y - 5), 3, 5, white);
            Imgproc.rectangle(input, new Point(rect.x, rect.y), new Point(rect.x + rect.width, rect.y + rect.height),
                    white, 5);
        }
        return input;
    }
}
Listing 4-17

Haar Classifier–Based Detection

If you have a cat at home or are using a cat video from the examples, when you apply this to a webcam stream, you should get a result that looks like Figure 4-15.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig15_HTML.jpg
Figure 4-15

Finding cats

Using Other Haar Definitions

There are other XML files for Haar cascades in the samples. Feel free to use one to detect people, eyes, or smiling faces as an exercise.

Transparent Overlay on Detection

When drawing the rectangles in the previous examples, you might have wondered whether it is possible to draw something other than rectangles on the detected shapes.

Listing 4-18 shows how to do this by loading a mask that will be overlaid at the location of the detected shape. This is pretty much what you use in your smartphone applications all the time.

Note that there is a trick here with the transparency layer in the drawTransparency function. The mask for the overlay is loaded using IMREAD_UNCHANGED as the loading flag; you have to use this or the transparency layer is lost.

Once you have the transparency layer, you then use it as a mask while copying the overlay so as to copy the exact pixels of the Mat object you want.
class FunWithHaar implements Filter {
    CascadeClassifier classifier;
    Mat mask;
    Scalar white = new Scalar(255, 255, 255);
    public FunWithHaar(String path) {
        classifier = new CascadeClassifier(path);
        mask = Imgcodecs.imread("masquerade_mask.png", Imgcodecs.IMREAD_UNCHANGED);
    }
    void drawTransparency(Mat frame, Mat transp, int xPos, int yPos) {
        List<Mat> layers = new ArrayList<Mat>();
        Core.split(transp, layers);
        Mat mask = layers.remove(3);
        Core.merge(layers, transp);
        Mat submat = frame.submat(yPos, yPos + transp.rows(), xPos, xPos + transp.cols());
        transp.copyTo(submat, mask);
    }
    public Mat apply(Mat input) {
        MatOfRect faces = new MatOfRect();
        classifier.detectMultiScale(input, faces);
        Mat maskResized = new Mat();
        for (Rect rect : faces.toArray()) {
            Imgproc.resize(mask, maskResized, new Size(rect.width, rect.height));
            int adjusty = (int) (rect.y - rect.width * 0.2);
            try {
                drawTransparency(input, maskResized, rect.x, adjusty);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        return input;
    }
}
Listing 4-18

Adding an Overlay

Depending on the transparent Mat object used, you may need to adjust the location, but otherwise you should get something that looks like Figure 4-16. Finally you can add some Venice Carnival feeling to your video streams!
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig16_HTML.jpg
Figure 4-16

Adding a mysterious mask as an overlay

Use a Batman Mask

I actually tried and could not find a proper Batman mask to use as an overlay to the video stream. Maybe you can help me by sending me the code with an appropriate Mat overlay to use!

Detecting by Template Matching

Template matching with OpenCV is just plain simple. It’s so simple that this should maybe have come earlier in the order of detection methods. Template matching means looking for a Mat within another Mat. OpenCV has a superpower function named matchTemplate that does this.

Listing 4-19 mostly revolves around using matchTemplate. Look for the usage of Core.minMaxLoc on the result received from matchTemplate. It is used to locate the index of the best score and will be used again when running neural networks.
class Template implements Filter {
    Mat template;
    public Template(String path) {
        this.template = Imgcodecs.imread(path);
    }
    @Override
    public Mat apply(Mat in) {
        Mat outputImage = new Mat();
        Imgproc.matchTemplate(in, template, outputImage, Imgproc.TM_CCOEFF);
        MinMaxLocResult mmr = Core.minMaxLoc(outputImage);
        Point matchLoc = mmr.maxLoc;
        Imgproc.rectangle(in, matchLoc, new Point(matchLoc.x + template.cols(), matchLoc.y + template.rows()),
                new Scalar(255, 255, 255), 3);
        return in;
    }
}
Listing 4-19

Pattern Matching

Now, let’s find a box containing a ReSpeaker like the one shown in Figure 4-17 because we are going to need this speaker in the next chapter, and I just can’t find it right now. Let’s use OpenCV to find it for us.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig17_HTML.jpg
Figure 4-17

The template

Detecting through OpenCV’s template matching is surprisingly fast and accurate, as shown in Figure 4-18.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig18_HTML.jpg
Figure 4-18

Finding the box for the speaker

It’s a bit hard to see the frame rate in Figure 4-18, but it is actually around 10 to 15 frames per second on the Raspberry Pi 4.

Detecting by Yolo

This is the final detection method presented in this chapter. Let’s say we want to apply a trained neural network to identify objects in a stream. After some testing on the hardware with little computing power, I got quite speedy results with Yolo/Darknet and the freely available Darknet networks trained on the Coco dataset.

The advantage of using neural networks on random inputs is that most of the trained networks are quite resilient and give good results, with 80 percent to 90 percent accuracy on close to real-time streams.

Training is the hardest part of using neural networks. In this book, we’ll restrict ourselves to running detection code on the Raspberry Pi, not training. You can find the steps for how to organize your pictures for re-training networks on the Darknet/Yolo web site.

The sequence of steps to achieve object detection with Darknet in OpenCV are as follows:
  1. 1.

    Load a network from its configuration file and weight file.

     
  2. 2.

    Find the output layers/nodes of that network, because this is where the results will be. The output layers are the layers that are not connected to more output layers.

     
  3. 3.

    Transform a Mat object into a blob for the network. A blob is an image, or a set of images, tuned to match the format expected by the network in terms of size, channel orders, etc.

     
  4. 4.

    We then run the network, meaning we feed it the blob and retrieve the values for the layers marked as output layers.

     
  5. 5.

    For each line in the results, we actually get a confidence value for each of the expected possible recognizable features. In Coco, the network is trained to be able to recognize 80 different possible objects, such as people, bicycles, cars, etc.

     
  6. 6.

    We then use MinMaxLocResult again to get the index for the most probable recognized object, and if the value for that index is more than 0, we keep it.

     
  7. 7.

    The first four values in each result line are actually the four values describing a box where the detected object was found, so we extract those four values and keep the rectangle and the index for its label.

     
  8. 8.

    Before drawing all the boxes, we also usually make use of NMSBoxes, which removes overlapping boxes. Most of the time overlapping boxes are multiple versions of the same positive detection on the same object.

     
  9. 9.

    Finally, we draw the remaining rectangle and add the label for the recognized object.

     
Listing 4-20 shows the full code for the base YoloDetector implemented as a filter.
class YoloDetector implements Filter {
    final static Size sz = new Size(416, 416);
    List<String> outBlobNames;
    Net net;
    List<String> layers;
    List<String> labels;
    List<String> getOutputsNames(Net net) {
        List<String> layersNames = net.getLayerNames();
        return net.getUnconnectedOutLayers().toList().stream().map(i -> i - 1).map(layersNames::get)
                .collect(Collectors.toList());
    }
    public YoloDetector(String modelWeights, String modelConfiguration) {
        net = Dnn.readNetFromDarknet(modelConfiguration, modelWeights);
        layers = getOutputsNames(net);
        try {
            labels = Files.readAllLines(Paths.get(LABEL_FILE));
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    @Override
    public Mat apply(Mat in) {
        findShapes(in);
        return in;
    }
    final int IN_WIDTH = 416;
    final int IN_HEIGHT = 416;
    final double IN_SCALE_FACTOR = 0.00392157;
    final int MAX_RESULTS = 20;
    final boolean SWAP_RGB = true;
    final String LABEL_FILE = "yolov3/coco.names";
    void findShapes(Mat frame) {
        Mat blob = Dnn.blobFromImage(frame, IN_SCALE_FACTOR, new Size(IN_WIDTH, IN_HEIGHT), new Scalar(0, 0, 0),
                 SWAP_RGB);
        net.setInput(blob);
        List<Mat> outputs = new ArrayList<>();
        for (int i = 0; i < layers.size(); i++)
            outputs.add(new Mat());
        net.forward(outputs, layers);
        postprocess(frame, outputs);
    }
    private void postprocess(Mat frame, List<Mat> outs) {
        List<Rect> tmpLocations = new ArrayList<>();
        List<Integer> tmpClasses = new ArrayList<>();
        List<Float> tmpConfidences = new ArrayList<>();
        int w = frame.width();
        int h = frame.height();
        for (Mat out : outs) {
            final float[] data = new float[(int) out.total()];
            out.get(0, 0, data);
            int k = 0;
            for (int j = 0; j < out.height(); j++) {
                Mat scores = out.row(j).colRange(5, out.width());
                Core.MinMaxLocResult result = Core.minMaxLoc(scores);
                if (result.maxVal > 0) {
                    float center_x = data[k + 0] * w;
                    float center_y = data[k + 1] * h;
                    float width = data[k + 2] * w;
                    float height = data[k + 3] * h;
                    float left = center_x - width / 2;
                    float top = center_y - height / 2;
                    tmpClasses.add((int) result.maxLoc.x);
                    tmpConfidences.add((float) result.maxVal);
                    tmpLocations.add(new Rect((int) left, (int) top, (int) width, (int) height));
                }
                k += out.width();
            }
        }
        annotateFrame(frame, tmpLocations, tmpClasses, tmpConfidences);
    }
    private void annotateFrame(Mat frame, List<Rect> tmpLocations, List<Integer> tmpClasses,
            List<Float> tmpConfidences) {
        MatOfRect locMat = new MatOfRect();
        MatOfFloat confidenceMat = new MatOfFloat();
        MatOfInt indexMat = new MatOfInt();
        locMat.fromList(tmpLocations);
        confidenceMat.fromList(tmpConfidences);
        Dnn.NMSBoxes(locMat, confidenceMat, 0.1f, 0.1f, indexMat);
        for (int i = 0; i < indexMat.total() && i < MAX_RESULTS; ++i) {
            int idx = (int) indexMat.get(i, 0)[0];
            int labelId = tmpClasses.get(idx);
            Rect box = tmpLocations.get(idx);
            String label = labels.get(labelId);
            annotateOne(frame, box, label);
        }
    }
    private void annotateOne(Mat frame, Rect box, String label) {
        Imgproc.rectangle(frame, box, new Scalar(0, 0, 0), 2);
        Imgproc.putText(frame, label, new Point(box.x, box.y), Imgproc.FONT_HERSHEY_PLAIN, 4.0, new Scalar(0, 0, 0), 3);
    }
}
Listing 4-20

Neural Network –Based Detection

You can now run your own set of object detections and experiments using the different available networks. Listing 4-21 shows how to load each of the main Yolo-based networks.
class Yolov2 extends YoloDetector {
    public Yolov2() {
        super("yolov2/yolov2.weights", "yolov2/yolov2.cfg");
    }
}
class TinyYolov2 extends YoloDetector {
    public TinyYolov2() {
        super("yolov2-tiny/yolov2-tiny.weights", "yolov2-tiny/yolov2-tiny.cfg");
    }
}
class Yolov3 extends YoloDetector {
    public Yolov3() {
        super("yolov3/yolov3.weights", "yolov3/yolov3.cfg");
    }
}
class TinyYolov3 extends YoloDetector {
    public TinyYolov3() {
        super("yolov3-tiny/yolov3-tiny.weights", "yolov3-tiny/yolov3-tiny.cfg");
    }
}
Listing 4-21

Java Classes and Constructors for the Different Yolo Networks

Running on the Raspberry Pi, Yolo v3 can detect cars and people on a busy street of Lisbon (Figure 4-19) and more cats (Figure 4-20).
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig19_HTML.jpg
Figure 4-19

Yolo v3 detecting cars and people

../images/490964_1_En_4_Chapter/490964_1_En_4_Fig20_HTML.jpg
Figure 4-20

Yolo v3 detecting cats

As you can see, the frame rate was actually very low for standard Yolo v3.

When trying this experiment with Yolo v3 Tiny, you can actually get close to 5 to 6 frames per second, which is still slightly below real time, but it still gives results with very good accuracy. See Figure 4-21.
../images/490964_1_En_4_Chapter/490964_1_En_4_Fig21_HTML.jpg
Figure 4-21

TinyYoloV3 detecting cats

You know my method. It is founded upon the observation of trifles.

—Arthur Conan Doyle,“The Boscombe Valley Mystery” (1891)

You’re down to the last few lines of this chapter, and it has been a long ride where you saw most of the object detection concepts with OpenCV in Java on the Raspberry Pi.

Namely, you learned about the following:
  • How to set up the Raspberry Pi for real-time object detection programming

  • How to use filters and pipelines to perform image and real-time video processing

  • How to implement some basic filters for Mat, used directly on real-time video streams from external devices and file-based videos

  • How to add some fun with Instagram-like filters

  • How to implement numerous object detection techniques using filters and pipelines

  • How to run a neural network trained on the Coco dataset that can be used in real time on the Raspberry Pi

In the next chapter, you’ll be introduced to Rhasspy, a voice recognition system, to connect the concepts presented in this chapter and apply them to home, office, or cat house automation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.138.138