4. Real-Time Video

4.1 Getting Started with Video Streaming

Problem

You have the Origami setup for image processing; now, you would like to know the origami setup for video processing.

Solution

Well, the bad news is that there is no extra project setup. So, we could almost close this recipe already.

The good news is that there are two functions that Origami gives you, but before using them we will cover how the underlying processing works.

First, we will create a videocapture object from the origami opencv3.video package and start/stop a stream with it.

Second, since we think this should definitely be easier to use, we will introduce the function that does everything for you: u/simple-cam-window.

Last, we will review u/cams-window, which makes it easy to combine multiple streams from different sources.

How it works

Do-It-Yourself Video Stream

You could skip this small section of the recipe, but it’s actually quite informative to know what is behind the scenes.

The simple idea of video stream manipulation starts with creating an opencv videocapture object that accesses available video devices.

That object can then return you a mat object, just like all the mat objects you have used so far. It is possible to then act on the mat object, and in the simplest case show the mat in a frame on the screen.

Origami uses something similar to u/imshow to display mats taken from video, but for this very first example let’s simply use u/imshow to display the mat.

Here, we do require another namespace : [opencv3.video :as v], but later on you will see that this step is not necessary, and you would require that extra video namespace only when using opencv video functions directly.

Let’s see how it goes by going through the following code example.

First, we create the videocapture object, which can access all the webcams of your host system.

We then open the camera with ID 0. That is probably the default in your environment, but we will also see later how to play with multiple devices.

(def capture (v/new-videocapture))

(.open capture 0)

We need a window to display the frame recorded from the device, and sure enough, we’ll create a binding named window. This window will be set to have a black background.

(def window

(u/show (new-mat 200 200 CV_8UC3 rgb/black)))

We then create a buffer to receive video data, as a regular OpenCV mat.

(def buffer (new-mat))

The core video loop will copy content to the buffer mat using the read function on the capture object, and then it will show the buffer in the window, using the function u/re-show.

(dotimes [_ 100]

(.read capture buffer)

(u/re-show window buffer))

At this stage, you should see frames showing up in a window on your screen, as in Figure 4-1.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig1_HTML.jpg — Figure 4-1
My favorite body soap

Finally, when the loop has finished, the webcam is released using the release function on the capture object.

(.release capture)

This should also have the effect of turning off the camera LED of your computer. One thing to think about at the end of this small exercise is … yes, this is a standard mat object that was used as a buffer in the display loop , and so, yes, you could already plug in some text or color conversion before displaying it.

One-Function Webcam

Now that you understand how the underlying webcam handling is done, here is another slightly shorter way of getting you to the same result, using u/simple-cam-window .

In this small section, we want to quickly review how to take the stream and manipulate it using that function.

In its simplest form, simple-cam-window is used with the identity function as the parameter. As you remember, identity takes an element and returns it as is.

(u/simple-cam-window identity)

Providing you have a webcam connected, this will start the same streaming video with the content of the stream showing in a frame.

The function takes a single parameter, which is the function applied to the mat before it is shown inside the frame.

Sweet. We’ll get back to it in a few seconds, but for now, here’s what you’ll find: simply converting the recording frames to a different colormap , you could pass an anonymous function only using apply-color-map!.

(u/simple-cam-window #(apply-color-map! % COLORMAP_HOT))

With the immediate result showing in Figure 4-2.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig2_HTML.jpg — Figure 4-2
Hot body soap

In the second version of u/simple-cam-window , you can specify settings for the frame and the video recording, all of this as a simple map passed as the first parameter to simple-cam-window.

For example:

(u/simple-cam-window

{:frame {:color "#ffcc88", :title "video", :width 350, :height 300}

:video {:device 0, :width 100, :height 120}}

identity)

In the map, the video key specifies the device ID, the device to take stream from, and the size of the frame to record. Note that if the size is not according to what the device is capable of, the setting will be silently ignored.

In the same parameter map, the frame key can specify the parameter, as seen in previous chapter, with the background color, the title, and the size of the window.

Ok, great; all set with the basics. Let’s play a bit.

Transformation Function

The identity function takes an element and returns it as is. We saw how identity worked in the first cam usage, by returning the mat as it was recorded by the opencv framework.

Now, say you would like to write a function that

takes a mat
resizes the mat by a factor of 0.5
changes the color map to WINTER
adds the current date as a white overlay

Not so difficult with all the knowledge you have gathered so far. Let’s write a small origami pipeline in a function my-fn! to do the image transformation:

(defn my-fn![mat]

(-> mat

(u/resize-by 0.5)

(apply-color-map! COLORMAP_WINTER)

(put-text! (str (java.util.Date.)) (new-point 10 50) FONT_HERSHEY_PLAIN 1 rgb/white 1) ))

Note here that the pipeline returns the transformed mat. Now let’s use this newly created pipeline on a still image .

(-> "resources/chapter03/ai5.jpg"

imread

my-fn!

u/mat-view)

And let’s enjoy a simple winter feline output (Figure 4-3).

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig3_HTML.jpg — Figure 4-3
Cool feline

And then, if you are in Starbucks and using your laptop webcam, you can use the new function my-fn! straight onto a video stream by passing it as an argument to simple-cam-window .

(u/simple-cam-window my-fn!)

Which would give you something like Figure 4-4.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig4_HTML.jpg — Figure 4-4
Starbucks ice coffee refill

Two Frames, or More, from the Same Input Source

This is a convenient method when trying to apply two or more functions from the same source. This is really only a matter of using the clone function to avoid memory conflicts with the source buffer.

Here, we create a function that takes the buffer as input, and then concatenates two images created from the same buffer. The first image on the left will be a black-and-white version of the stream, while the right one will be a flipped version of the buffer.

(u/simple-cam-window

(fn [buffer]

(vconcat! [

(-> buffer

clone

(cvt-color! COLOR_RGB2GRAY)

(cvt-color! COLOR_GRAY2RGB))

(-> buffer clone (flip! -1)) ])))

Note that we use the clone twice for each side of the concatenation (Figure 4-5).

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig5_HTML.jpg — Figure 4-5
Gray left, flipped right, but it is still body soap

You can push this method even further by cloning the input buffer as many times as you want; to highlight this, here is another example of applying a different color map three times onto the same input buffer.

(u/simple-cam-window

(fn [buffer]

(hconcat! [

(-> buffer clone (apply-color-map! COLORMAP_JET))

(-> buffer clone (apply-color-map! COLORMAP_BONE))

(-> buffer clone (apply-color-map! COLORMAP_PARULA))])))

And the result is shown in Figure 4-6.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig6_HTML.jpg — Figure 4-6
Jet, bone, and parula, but this is still body soap

4.2 Combining Multiple Video Streams

Problem

You played around creating many outputs from the same buffer, but it would be nice to also be able to plug in multiple cameras and combine their buffers together.

Solution

Origami comes with a sibling function to u/simple-cam-window named u/cams-window, which is an enhanced version where you can combine multiple streams from the same or multiple sources.

How it works

u/cams-window is a function that takes a list of devices, each defining a device from an ID, and usually a transformation function.

The function also takes a video function to concatenate two or more device outputs, and finally a frame element to define the usual parameters of the window, like sizes and title.

(u/cams-window

{:devices [

{:device 0 :width 300 :height 200 :fn identity}

{:device 1 :width 300 :height 200 :fn identity}]

:video { :fn

#(hconcat! [

(-> %1 (resize! (new-size 300 200)))

(-> %2 (resize! (new-size 300 200))) ])}

:frame

{:width 650 :height 250 :title "OneOfTheSame"}})

Figure 4-7 shows two devices targeting the same body soap, but from different angles.

The left frame takes input from the device with ID 0, and the right frame input from the device with ID 1.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig7_HTML.jpg — Figure 4-7
More body soap pictures

Note that even though sizes are specified for each device, a resize is actually still needed, because devices have very specific combinations of height and width they can use, and so using different devices may be a bit of a challenge.

Still, the resize! call in the combining video function does not feel out of place, and things work smoothly afterward.

4.3 Warping Video

Problem

This recipe is about warping the buffer of the video stream using a transformation, but it is also about updating the transformation in real time.

Solution

The warping transformation itself will be done using opencv’s get-perspective-transform from the core namespace.

The real-time updating will be done using a Clojure atom, with the software transactional memory, well suited here to update the value of the matrix required to do the transformation, while the display loop is reading the content of that matrix, thus always getting the latest value.

How it works

To perform a perspective transform, we need a warp matrix. The warp matrix is contained in an atom and first initialized to nil.

(def mt

(atom nil))

The warp matrix used to do the transformation can be created from four points, with their locations before and after the transformation.

Instead of acting on a local binding , we will update the atom value using reset!.

(def points1

[[100 10]

[200 100]

[28 200]

[389 390]])

(def points2

[[70 10]

[200 140]

[20 200]

[389 390]])

(reset! mt

(get-perspective-transform

(u/matrix-to-matofpoint2f points1)

(u/matrix-to-matofpoint2f points2)))

Remember, you can still dump the warp matrix, which is a regular 3×3 mat, by using a dereferencing call on it, using @, or deref.

(dump @mt)

With the points defined in the preceding, this gives the following matrix of doubles.

[1.789337561985906 0.3234215275201738 -94.5799621372129]

[0.7803091692375479 1.293303360247406 -78.45137776386103]

[0.002543030309135725 -3.045754676722361E-4 1]

Now let’s create the function that will warp a mat using the matrix saved in the mt atom.

(defn warp! [ buffer ]

(-> buffer

(warp-perspective! @mt (size buffer ))))

Remember that this function can still be applied to standard images; for example, if you want to warp cats, you could write the following origami pipeline :

(-> "resources/chapter03/ai5.jpg"

imread

(u/resize-by 0.7)

warp!

u/imshow)

And the two cats from before would be warping as in Figure 4-8.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig8_HTML.jpg — Figure 4-8
Warped cats

Now let’s apply that function to a video stream, using warp! as a parameter to the u/simple-cam window.

(u/simple-cam-window warp!)

The body soap has been warped! (Figure 4-9)

Obviously, the book is not doing too much to express the difference between a still cat image and the body soap stream , so you can plug in your own stream there.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig9_HTML.jpg — Figure 4-9
Warped body soap

4.4 Using Face Recognition

Problem

While the OpenCV face recognition features work perfectly fine on still pictures, working on video streams differs in terms of looking for moving faces showing up in real time, as well as counting people and so on.

Solution

The first step is to load a classifier: the opencv object that will be able to find out the matching element on a mat.

The classifier is loaded from an xml definition using the origami function new-cascadeclassifier.

Then, a call to detectMultiScale with that classifier and a mat will return a list of matching rect objects.

Those rect objects can then be used to highlight the found faces with a rectangle, or for creating submat.

How it works

There is no extra Clojure namespace required to make this work, as the new-cascadeclassifier function is already in the core namespace.

If the xml file is on the file system, then you can load the classifier with

(def detector

(new-cascadeclassifier

"resources/lbpcascade_frontalface.xml"))

If the xml is stored as a resource in a jar file, then you could load it with

(def detector

(new-cascadeclassifier

(.getPath (clojure.java.io/resource "lbpcascade_frontalface.xml"))))

Rectangle objects found by the classifier will need to be drawn. The classifier’s detect function returns a list of rectangles, so let’s write a function that simply loops over the list of rect objects and draws a blue line border on each rect.

(defn draw-rects! [buffer rects]

(doseq [rect (.toArray rects)]

(rectangle

buffer

(new-point (.-x rect) (.-y rect))

(new-point (+ (.-width rect) (.-x rect)) (+ (.-height rect) (.-y rect)))

rgb/blue

5))

buffer)

Then let’s define a second function, find-faces! , which calls the detectMultiScale method on the classifier and draws the rectangles using the draw-rects! function defined in the preceding.

(defn find-faces![buffer]

(let [rects (new-matofrect)]

(.detectMultiScale detector buffer rects)

(-> buffer

(draw-rects! rects)

(u/resize-by 0.7))))

We have all the blocks here again, and it’s now a simple matter of calling find-faces! through u/simple-cam-window.

(u/simple-cam-window find-faces!)

And if you find yourself in Starbucks one morning on a terrace, the image could be something like Figure 4-10.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig10_HTML.jpg — Figure 4-10
Quiet impressive morning coffee face

The draw-rects! function could really be anything since you have access to a buffer object.

For example, this second version of draw-rects! applies a different color map on the submat created by the rect of the found face.

(defn draw-rects! [buffer rects]

(doseq [r (.toArray rects)]

(-> buffer

(submat r)

(apply-color-map! COLORMAP_COOL)

(copy-to (submat buffer r))))

(put-text! buffer (str (count (.toArray rects) ) )

(new-point 30 100) FONT_HERSHEY_PLAIN 5 rgb/magenta-2 2))

And reusing the created building blocks, this gives the cool face from Figure 4-11.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig11_HTML.jpg — Figure 4-11
Cool morning coffee face

This last example of drawing faces takes the first found face and makes a big close-up on the right-hand side of the video stream.

(defn draw-rects! [buffer rects]

(if (> (count (.toArray rects)) 0)

(let [r (first (.toArray rects))

s (-> buffer clone (submat r) (resize! (.size buffer)))]

(hconcat! [buffer s]))

buffer))

Obviously, Figure 4-12 will quickly get you convinced that this should really only be used for house BBQs, in order to show everyone who has been eating all the meat.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig12_HTML.jpg — Figure 4-12
Overview and close-up on the same video window

4.5 Diffing with a Base Image

Problem

You would like to take a mat image, define it as a base, and discover changes made to that base image.

Solution

This is a very short recipe but is quite helpful on its own to understand the more complex recipe on movement that is coming after.

To create a diff of an image and its base, we here first create two pieces of video callback code: one will store the background picture in a Clojure atom , and the other will do a diff with that base atom.

A grayed version of the result will then be passed through a simple threshold function, to prepare the result for additional shape recognition and/or for further processing.

How it works

To compute a diff of an image with another one, you need two mats: one for the base, and one updated version with (we hope) new extra shapes in it.

We start by defining the Clojure atom and starting a video stream to create an atom with a reference on the image of the background.

As long as the cam-window is running, the latest buffer mat from the video stream will be stored in the atom.

(def base-image (atom nil))

(u/simple-cam-window

(fn [buffer] (swap! base-image (fn[_] buffer) ) ))

Once you are happy enough with the background, you can stop the cam-window and check the currently stored background for the picture with imshow and a deref-ed version of the atom.

(u/imshow @base-image)

This time, the image is a typical one of a busy home workplace (Figure 4-13).

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig13_HTML.jpg — Figure 4-13
Hearts and a speaker

Now, the next step is to define a new stream callback to use with simple-cam-window, which will diff with the mat stored in the Clojure atom.

The diff is done with the opencv function absdiff , which takes three mats, namely, the two inputs to diff and the output.

(defn diff-with-bg [buffer]

(let[ output (new-mat)]

(absdiff buffer @base-image output)

output))

(u/simple-cam-window diff-with-bg)

Obviously, before starting the second stream and introducing new shapes, you should stop the first recording stream.

This would give something like Figure 4-14, where the added body soap is clearly being recognized.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig14_HTML.jpg — Figure 4-14
Body soap in the office!

Now usually, the next step is to clean the shape showing on top of the background a bit by turning the diff mat to gray and applying a very high threshold after a blur.

; diff in gray

(defn diff-in-gray [buffer]

(-> buffer

clone

(cvt-color! COLOR_RGB2GRAY)

(median-blur! 7)

(threshold! 10 255 1)))

We have two processing functions for the same buffer, and in Clojure it is actually quite easy to combine them with comp, so let’s try this now.

Remember that comp combines the function from right to left, meaning the first function that is being applied is the rightmost one.

(u/simple-cam-window (comp diff-in-gray diff-with-bg ))

See the composition result and the shape of the body soap showing in Figure 4-15.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig15_HTML.jpg — Figure 4-15
Added shape worked for more processing

Here, you could compile all the steps, creating a simple mask from the preceding added shape mat, and use the mask to highlight the diff-ed part only.

None of this is too surprising, except maybe the bitwise-not! call, summarized in the highlight-new! function .

(defn highlight-new! [buffer]

(let [output (u/mat-from buffer) w (-> buffer

clone

diff-with-bg

(cvt-color! COLOR_RGB2GRAY)

(median-blur! 7)

(threshold! 10 255 1)

(bitwise-not!))]

(set-to output rgb/black)

(copy-to buffer output w)

output))

And the body soap output shows in Figure 4-16.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig16_HTML.jpg — Figure 4-16
Back to body soap

The streams were taken during a heavy jet lag around 3 am, and so the lighting conditions give a bit of noise at the bottom of the body soap, but you could try to remove that noise by updating the mask to not include the desk wood color. Your turn!

4.6 Finding Movement

Problem

You would like to identify and highlight movement and moving shapes in a video stream.

Solution

We start by doing an accumulation of the float values of the buffer, after cleaning it up. This is done with the function accumulate-weighted.

Then, we do a diff between the grayed version of the buffer and the computed average mat, and we retrieve a mask mat of the delta as quickly presented in the previous recipe.

Finally, we apply a threshold on the delta, clean up the result with a bit of dilation, and transform the mat back to color mode to be displayed onscreen.

This is actually easier than it sounds!

How it works

Here, we would like to show on a mat the delta created by the movements.

Finding Movement in Black and White

The first step is to take a buffer and create a cleaned (via a blur) gray version of it.

We are not interested to display this mat, but just to perform arithmetic on it; we will convert the mat to a 32-bit float mat, or in opencv language CV_32F.

(defn gray-clean! [buffer]

(-> buffer

clone

(cvt-color! COLOR_BGR2GRAY)

(gaussian-blur! (new-size 3 3) 0)

(convert-to! CV_32F)))

This function will be used to prepare a gray version of the mat. Let’s now work on computing the accumulated average and a diff between the average and the most recent buffer.

We’ll create another function, find-movement, which will highlight, in black and white, recent movement in the picture.

That function will get a Clojure atom, avg, as a parameter to keep track of the average value of the video’s incoming mat objects. The second parameter is the usual buffer mat passed to the callback. The function will display the frame-delta.

In the first if switch, we make sure the average mat, stored in the atom, is initialized with a proper value from the incoming stream.

Then a diff is computed using absdiff, onto which we apply a short threshold-dilate-cvt-color pipeline to show the movements directly.

(defn find-movement [ avg buffer]

(let [gray (gray-clean! buffer) frame-delta (new-mat)]

(if (nil? @avg)

(reset! avg gray))

; compute the absolute diff on the weighted average

(accumulate-weighted gray @avg 0.05 (new-mat))

(absdiff gray @avg frame-delta)

; apply threshold and convert back to RGB for display

(-> frame-delta

(threshold! 35 255 THRESH_BINARY)

(dilate! (new-mat))

(cvt-color! COLOR_GRAY2RGB)

(u/resize-by 0.8))))

We finally define a function wrapping the find-movement function, with an inlined Clojure atom. That atom will contain the average of the mat objects.

(def find-movements!

(partial find-movement (atom nil)))

Time to put those functions in action with u/simple-cam-window.

(u/simple-cam-window find-movements!)

This is shown in Figure 4-17.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig17_HTML.jpg — Figure 4-17
Movement is detected!

We would like to show movements here, but because the amount of black ink required to print this is going to scare the publisher, let’s add a bitwise operation to do a black-on-white instead and see how the live progression goes.

Let’s update the find-movement function with a bitwise-not! call on the frame-delta mat. Before that, we need to convert the matrix back to something we can work on, using opencv’s convert-to! function, with a type target CV_8UC3, which is usually what we work with.

(defn find-movement [ avg buffer]

(let [gray (gray-clean! buffer) frame-delta (new-mat)]

...

(-> frame-delta

(threshold! 35 255 THRESH_BINARY)

(dilate! (new-mat))

(convert-to! CV_8UC3)

(bitwise-not!)

(cvt-color! COLOR_GRAY2RGB)

(u/resize-by 0.8))))

Good; let’s call simple-cam again. Wow. Figure 4-18 now looks a bit scary.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig18_HTML.jpg — Figure 4-18
Scary black-on-white movement

And if you stop getting agitated in front of your computer, the movement highlights are stabilizing and slowly moving to a fully white mat, as shown in the progression of Figure 4-19.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig19_HTML.jpg — Figure 4-19
Stabilizing movement

Find and Draw Contours

At this stage, it would be easy to find and draw contours to highlight movement on the original colored buffer.

Let’s find contours of the nice movement mat that you managed to create.

A few more lines are added to the find-movement function, notably the finding contours on the delta mat and the drawing on the color mat.

You have seen all of this find-contours dance in the previous chapter, so let’s get down to the updated code.

(defn find-movement [ avg buffer]

(let [ gray (base-gray! buffer)

frame-delta (new-mat)

contours (new-arraylist)]

(if (nil? @avg)

(reset! avg gray))

(accumulate-weighted gray @avg 0.05 (new-mat))

(absdiff gray @avg frame-delta)

(-> frame-delta

(threshold! 35 255 THRESH_BINARY)

(dilate! (new-mat))

(convert-to! CV_8UC3)

(find-contours contours (new-mat) RETR_EXTERNAL CHAIN_APPROX_SIMPLE))

(-> frame-delta

(bitwise-not!)

(cvt-color! COLOR_GRAY2RGB)

(u/resize-by 0.8))

(-> buffer

; (u/draw-contours-with-rect! contours )

(u/draw-contours-with-line! contours)

(u/resize-by 0.8))

(hconcat! [frame-delta buffer]) ))

Calling this new version of the find-movement function gives something like Figure 4-20, but you can probably be way more creative from there.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig20_HTML.jpg — Figure 4-20
Highlights moving parts in blue

4.7 Separating the Foreground from the Background Using Grabcut

Problem

Grabcut is another opencv method that can be used to separate the foreground from the background of an image. But can it be used in real time like on a video stream?

Solution

There is indeed a grab-cut function that easily separates the front from the background. The function needs just a bit of understanding to see the different masks required to get it going, so we will focus first on understanding how things works on a still image.

We will then move on to the live stream solution . This will quickly lead to a speed problem, because grab-cut takes more time than what is available with real-time processing.

So, we will use a small trick by turning down the resolution of the work area just to bring the time used by grab-cut to a minimum; then, we’ll use the full resolution when performing the rest of the processing, resulting in a grabcut.

How it works

On a Still Image

Here we want to call grabcut and separate a depth layer from the other ones.

The idea with grabcut is to prepare to use either a rectangle or a mask on the input picture and pass it to the grabcut function.

The result stored in that single output mask will contain a set of 1.0, 2.0, or 3.0 scalar values depending on what grabcut thinks is part of each of the different layers.

Then we use opencv compare on this mask and another fixed 1×1 mat of the scalar value of the layer we would like to retrieve. We obtain a mask only for the layer of interest.

Finally, we do a copy of the original image, on an output mat, using the mask created in step 2.

Ready? Let’s go for a cat example.

First, we load one of those cat pictures that we love so much and turn it to a proper working-size mat object.

(def source "resources/chapter03/ai6.jpg")

(def img (-> source imread (u/resize-by 0.5)))

The loaded cat picture is shown in Figure 4-21.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig21_HTML.jpg — Figure 4-21
A feline kiss for you

Then, we define a mask mat , which will receive the output of the grabcut call, namely, per-pixel information about the layer info.

We also define a rectangle for the region of interest (ROI) of where we want the grabcut to be done, here almost the full picture, mostly just removing the borders.

(def mask (new-mat))

(def rect

(new-rect

(new-point 10 10)

(new-size (- (.width img) 30) (- (.height img) 30 ))))

Now that we have all the required inputs for grabcut, let’s call it with the mask, the ROI, and the grabcut init param , here GC_INIT_WITH_RECT. The other available method is to use GC_INIT_WITH_MASK , which as you probably have guessed is initialized with a mask instead of a rect.

(grab-cut img mask rect (new-mat) (new-mat) 11 GC_INIT_WITH_RECT)

Grabcut has been called. To get an idea of the retrieved content of the output, let’s quickly see the matrix content on a small submat of the mask .

(dump (submat mask (new-rect (new-point 10 10) (new-size 5 5))))

If you try it yourself, you would see values like

[2 2 2 2 2]

Another submat dump elsewhere in the mat gives a different result:

(dump (submat mask (new-rect (new-point 150 150) (new-size 5 5))))

In turn, this gives

[3 3 3 3 3]

We can guess from this different matrix that the layer is different.

The idea here is to retrieve a mask made of all the same values, so now let’s create a mask from all the pixels that are contained in layer 3, meaning that they are made of 3.0 values.

We’ll call this the fg-mask, for foreground mask .

(def fg-mask (clone mask))

(def source1 (new-mat 1 1 CV_8U (new-scalar 3.0)))

(compare mask source1 fg-mask CMP_EQ)

(u/mat-view fg-mask)

The cat foreground mask is shown in Figure 4-22.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig22_HTML.jpg — Figure 4-22
Foreground mask

We can then use copy-to from the original input image, and the fg-mask on a new black mat of the same size as the input.

(def fg_foreground (-> img (u/mat-from) (set-to rgb/black)))

(copy-to img fg_foreground fg-mask)

(u/mat-view fg_foreground)

And we get the mat of Figure 4-23.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig23_HTML.jpg — Figure 4-23
Only the foreground of the feline kiss

Notice how we get a bit of an approximation where the two kittens cross each other, but overall the result is pretty effective.

Before moving on, let’s quickly retrieve the complementary mask, the background mask , by focusing on the layer with scalar values of 2.0.

First, we create a mask again to receive the output, this time bg-mask.

(def bg-mask (clone mask))

(def source2 (new-mat 1 1 CV_8U (new-scalar 2.0)))

(compare mask source2 bg-mask CMP_EQ)

(u/mat-view bg-mask)

The result for the background mask is shown in Figure 4-24.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig24_HTML.jpg — Figure 4-24
Background mask

Then, simply do a copy similar to the one that was done for the foreground.

(def bg_foreground (-> img (u/mat-from) (set-to (new-scalar 0 0 0))))

(copy-to img bg_foreground bg-mask)

(u/mat-view bg_foreground)

And the result is shown in Figure 4-25.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig25_HTML.jpg — Figure 4-25
Mat of the background layer

Now that you have seen how to separate the different layers on a still image, let’s move on to video streaming.

On a Video Stream

As you may have noticed, the grabcut step in the preceding example was very slow, mostly due to a lot of heavy computations done to achieve a clean separation of the different layers. But how bad is it?

Let’s give it a quick try with a first dumb version of a real-time grabcut.

We’ll call this function in-front- slow , and basically just compile the steps we have just seen in the still example in a single function.

(defn in-front-slow [buffer]

(let [

img (clone buffer)

rect (new-rect

(new-point 5 5)

(new-size (- (.width buffer) 5) (- (.height buffer) 5 )))

mask (new-mat)

pfg-mask (new-mat)

source1 (new-mat 1 1 CV_8U (new-scalar 3.0))

pfg_foreground (-> buffer (u/mat-from) (set-to rgb/black))]

(grab-cut img mask rect (new-mat) (new-mat) 7 GC_INIT_WITH_RECT)

(compare mask source1 pfg-mask CMP_EQ)

(copy-to buffer pfg_foreground pfg-mask)

pfg_foreground))

And then, let’s use this function as a callback to our now-familiar u/simple-cam-window .

(u/simple-cam-window in-front-slow)

This slowly gives the output seen in Figure 4-26.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig26_HTML.jpg — Figure 4-26
Slow, slow, slow

As you will quickly realize, this is not very usable as is on a video stream.

The trick here is actually to turn down the resolution of the input buffer, do the grabcut on that lower-resolution mat , and get the grabcut mask. Then, do the copy using the full-sized picture and the mask retrieve from grabcut on a lower resolution.

This time, we’ll create an in-front function , which will be a slightly updated version of the preceding, but now including a pyr-down–pyr-up dance around the grabcut call (Figure 4-27).

To make this easier, we’ll set the number of iterations of the dance as a parameter of the callback .

(defn in-front [resolution-factor buffer]

(let [

img (clone buffer)

rect (new-rect

(new-point 5 5)

(new-size (- (.width buffer) 5) (- (.height buffer) 5 )))

mask (new-mat)

pfg-mask (new-mat)

source1 (new-mat 1 1 CV_8U (new-scalar 3.0))

pfg_foreground (-> buffer (u/mat-from) (set-to (new-scalar 0 0 0)))]

(dotimes [_ resolution-factor] (pyr-down! img))

(grab-cut img mask rect (new-mat) (new-mat) 7 GC_INIT_WITH_RECT)

(dotimes [_ resolution-factor] (pyr-up! mask))

(compare mask source1 pfg-mask CMP_EQ)

(copy-to buffer pfg_foreground pfg-mask)

pfg_foreground))

Then, call simple-cam-window with this new callback.

(u/simple-cam-window (partial in-front 2))

It’s hard to get the feeling of speed by just reading, so do go ahead and try this locally.

Usually, a factor of 2 for the resolution-down dance is enough, but it depends on both your video hardware and the speed of the underlying processor.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig27_HTML.jpg — Figure 4-27
As fast as you want, baby

4.8 Finding an Orange in Real Time

Problem

You would like to detect and track an orange in a video stream. It could also be a lemon, but the author ran out of lemons so we will use an orange.

Solution

Here we will use techniques you have seen before, like hough-circles or find-contours , and apply them to real-time streaming. We’ll draw the shape of the moving orange on the real-time stream.

For either of the solutions, you probably remember that the buffer needs some minor preprocessing to detect the orange. Here, to keep things simple, we’ll do a simple in-range processing in the hsv color space.

How it works

Using Hough-Circles

First, we’ll focus on finding the proper hsv range by taking a one-shot picture of the orange.

First, let’s put the orange on the table (Figure 4-28).

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig28_HTML.jpg — Figure 4-28
Orange on the table, Annecy, France

We first switch to hsv color space, then apply the in-range function, and finally dilate the found orange shape a bit so it’s easier for the coming hough-circle call.

In origami, this gives

(def hsv (-> img clone (cvt-color! COLOR_RGB2HSV)))

(def thresh-image (new-mat))

(in-range hsv (new-scalar 70 100 100) (new-scalar 103 255 255) thresh-image)

(dotimes [_ 1]

(dilate! thresh-image (new-mat)))

Now, you’ll remember how to do hough-circles from Chapter 3, so no need to spend too much time on that here. The important thing in this part is to have the proper radius range for the orange, and here we take a 10–50 pixels diameter to identify the orange.

(def circles (new-mat))

(def minRadius 10)

(def maxRadius 50)

(hough-circles thresh-image circles CV_HOUGH_GRADIENT 1 minRadius 120 15 minRadius maxRadius)

At this stage, you should have only one matching circle for the orange. It is quite important to work on this step until exactly one circle is found.

As a check, printing the circle mat should give you a 1×1 mat, like the following:

#object[org.opencv.core.Mat 0x3547aa31 Mat [ 1*1*CV_32FC3, isCont=true, isSubmat=false, nativeObj=0x7ff097ca7460, dataAddr=0x7ff097c4b980 ]]

Once you have the mat nailed, let’s draw a pink circle on the original image (Figure 4-29).

(def output (clone img))

(dotimes [i (.cols circles)]

(let [ circle (.get circles 0 i) x (nth circle 0) y (nth circle 1) r (nth circle 2) p (new-point x y)]

(opencv3.core/circle output p (int r) color/ color/magenta- 3)))

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig29_HTML.jpg — Figure 4-29
Orange and magenta

Everything is there, so let’s wrap up our discoveries as a single function working on the buffer from the video stream; we’ll call that function my-orange!, which is a recap of the previous steps.

(defn my-orange! [img]

(u/resize-by img 0.5)

(let [ hsv (-> img clone (cvt-color! COLOR_RGB2HSV))

thresh-image (new-mat)

circles (new-mat)

minRadius 10

maxRadius 50

output (clone img)]

(in-range hsv (new-scalar 70 100 100) (new-scalar 103 255 255) thresh-image)

(dotimes [_ 1]

(dilate! thresh-image (new-mat)))

(hough-circles thresh-image circles CV_HOUGH_GRADIENT 1 minRadius 120 15 minRadius maxRadius)

(dotimes [i (.cols circles)]

(let [ circle (.get circles 0 0) x (nth circle 0) y (nth circle 1) r (nth circle 2) p (new-point x y)]

(opencv3.core/circle output p (int r) color/magenta- 3)))

output))

Now it’s a simple matter of again passing that callback function to the simple-cam-window.

(u/simple-cam-window my-orange!)

Figures 4-30 and 4-31 show how the orange is found properly, even in low-light conditions. Winter in the French Alps after a storm did indeed make the evening light, and everything under it, a bit orange.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig30_HTML.jpg — Figure 4-30
Orange on a printer

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig31_HTML.jpg — Figure 4-31
Mei and oranges

Using Find-Contours

Instead of looking for a perfect circle, you may be looking for a slightly distorted shape, and this is when using find-contours actually gives better results than hough-circles.

Here we combine the same hsv range found a few minutes ago to select the orange and apply the find-contours technique from Chapter 3.

The find-my-orange! callback brings back the familiar find-contours and draw-contours function calls. Note that we draw the contour of found shapes only if those are bigger than the smallest expected size of the orange.

(defn find-my-orange! [img ]

(let[ hsv (-> img clone (cvt-color! COLOR_RGB2HSV))

thresh-image (new-mat)

contours (new-arraylist)

output (clone img)]

(in-range hsv (new-scalar 70 100 100) (new-scalar 103 255 255) thresh-image)

(find-contours

thresh-image

contours

(new-mat) ; mask

RETR_LIST

CHAIN_APPROX_SIMPLE)

(dotimes [ci (.size contours)]

(if (> (contour-area (.get contours ci)) 100 )

(draw-contours output contours ci color/pink-1 FILLED)))

output))

Giving this callback to simple-cam-window shows Mei playing around with a pink-colored orange in Figure 4-32.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig32_HTML.jpg — Figure 4-32
Mei and the pink orange, playing at a theater nearby

4.9 Finding an Image Within the Video Stream

Problem

You would like to find the exact replica of an image within a stream.

Solution

OpenCV comes with feature detection functions that you can use. Unfortunately, those features are mostly Java oriented.

This recipe will show how to bridge Java and Origami, and how using Clojure helps a bit by reducing boilerplate code.

Here we will use three main OpenCV objects :

FeatureDetector,
DescriptorExtractor,
DescriptorMatcher.

Feature extraction works by finding keypoints of both the input picture and the to-be-found image using a feature detector. Then, you compute a descriptor from each of the two sets of points using a descriptor extractor.

Once you have the descriptors, those can be passed as input to a descriptor matcher, which gives a matching result as a set of matches, with each match being given a score via a distance property.

We can then eventually filter points that are the most relevant and draw them on the stream.

The code listings are a bit longer than usual, but let’s get this last recipe working on your machine too!

How it works

For this example, we will be looking around for my favorite body soap, eucalyptus scent , both in still images and in real time.

Figure 4-33 shows the concerned body soap.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig33_HTML.jpg — Figure 4-33
Petit Marseillais

Still Image

The first test is to be able to find the body soap in a simple still picture, like the one in Figure 4-34.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig34_HTML.jpg — Figure 4-34
Carmen , where in the world is my body soap?

To get started, we need a few more Java object imports, namely, the detector and the extractor , which we will initialize straightaway before doing any processing.

(ns wandering-moss

(:require

[opencv3.core :refer :all]

[opencv3.utils :as u])

(:import

[org.opencv.features2d Features2d DescriptorExtractor DescriptorMatcher FeatureDetector]))

(def detector (FeatureDetector/create FeatureDetector/AKAZE))

(def extractor (DescriptorExtractor/create DescriptorExtractor/AKAZE))

Basic setup is done; we then load the body soap background through a short origami pipeline and ask the detector to detect points on it.

(def original

(-> "resources/chapter04/bodysoap_bg.png" imread (u/resize-by 0.3)))

(def mat1 (clone original))

(def points1 (new-matofkeypoint))

(.detect detector mat1 points1)

The coming step is not required whatsoever, but drawing the found keypoints gives an idea of where the matcher thinks the important points are in the mat.

(def show-keypoints1 (new-mat))

(Features2d/drawKeypoints mat1 points1 show-keypoints1 (new-scalar 255 0 0) 0)

(u/mat-view show-keypoints1)

This gives a bunch of blue circles, as shown in Figure 4-35.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig35_HTML.jpg — Figure 4-35
Keypoints of the bodysoap background

Of course, it may be useful to clean up and remove imperfections before retrieving keypoints , but let’s check how the matching works on the raw mat.

Note how the intensity of the points is already pretty strong on the body soap itself.

We now repeat the same steps for a body soap–only mat.

(def mat2

(-> "resources/chapter04/bodysoap.png" imread (u/resize-by 0.3)))

(def points2 (new-matofkeypoint))

(.detect detector mat2 points2)

Here again, this drawing points part is not required but it helps to give a better idea of what is going on.

(def show-keypoints2 (new-mat))

(Features2d/drawKeypoints mat2 points2 show-keypoints2 (new-scalar 255 0 0) 0)

(u/mat-view show-keypoints2)

The detector result is in Figure 4-36, and again, the keypoints look to be focused on the label of the body soap.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig36_HTML.jpg — Figure 4-36
Detector result on the body soap

The next step is to extract two feature sets that will then be used with the matcher.

This is simply a matter of calling compute on the extractor with the sets of found points from the previous step.

(def desc1 (new-mat))

(.compute extractor mat1 points1 desc1)

(def desc2 (new-mat))

(.compute extractor mat2 points2 desc2)

Now, on to the matching step . We create a matcher through DescriptorMatcher and give it a way to find out matches.

In IT, brute force is always the recommended way to find a solution. Just try every single solution and see if any match.

(def matcher

(DescriptorMatcher/create DescriptorMatcher/BRUTEFORCE_HAMMINGLUT))

(def matches (new-matofdmatch))

(.match matcher desc1 desc2 matches)

As was said in the solution summary, each match is rated on how good it is through its distance value .

If printed, each match looks something like the following:

#object[org.opencv.core.DMatch 0x38dedaa8 "DMatch [queryIdx=0, trainIdx=82, imgIdx=0, distance=136.0]"]

With the distance value , the score of the match itself usually shows up as a value between 0 and 300.

So now, let’s create a quick Clojure function to sort and filter good matches. This is simply done by filtering on their distance property. We will filter on matches that are below 50. You may reduce or increase that value as needed, depending on the quality of the recording.

(defn best-n-dmatches2[dmatches]

(new-matofdmatch

(into-array org.opencv.core.DMatch

(filter #(< (.-distance %) 50) (.toArray dmatches)))))

The draw-matches method is a coding nightmare, but it can be seen as mostly a wrapper around the nightmarish drawMatches Java method from the OpenCV.

We mostly pass the parameters the way they are expected using Java interop and some cleanup on each parameter. We also create the output mat bigger, so that we can fit in the background picture and the body soap on the same mat.

(defn draw-matches [_mat1 _points1 _mat2 _points2 _matches]

(let[ output (new-mat

(* 2 (.rows _mat1))

(* 2 (.cols _mat1))

(.type _mat1))

_sorted-matches (best-n-dmatches2 _matches)]

(Features2d/drawMatches

_mat1

_points1

_mat2

_points2

_sorted-matches

output

(new-scalar 255 0 0)

(new-scalar 0 0 255)

(new-matofbyte)

Features2d/NOT_DRAW_SINGLE_POINTS)

output))

And now, with all this, we can draw the matches found by the matcher, using the preceding function.

We pass it the first and second mats, as well as their respective found key points and the set of matches.

(u/mat-view

(draw-matches mat1 points1 mat2 points2 matches))

This, surprisingly after all the obscure coding, works very well, as shown in Figure 4-37.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig37_HTML.jpg — Figure 4-37
Drawing matches

Video Stream

Compared to what you have just been through, the video stream version is going to feel like a breath of fresh air.

We will create a where-is-my-body-soap! function that will reuse the matcher defined in the preceding and run the detector, extractor, and match within the stream callback on the buffer mat.

The previously defined draw-matches function is also reused to draw the matches on the real-time stream.

(defn where-is-my-body-soap! [buffer]

(let[ mat1 (clone buffer)

points1 (new-matofkeypoint)

desc1 (new-mat)

matches (new-matofdmatch)]

(.detect detector mat1 points1)

(.compute extractor mat1 points1 desc1)

(.match matcher desc1 desc2 matches)

(draw-matches mat1 points1 mat2 points2 matches)))

And you can use that callback to simple-cam-window but … Ah! It seems Mei has found the body soap just before this recipe feature detection could be run!

Figure 4-38 shows both on the video stream.

../images/459821_1_En_4_Chapter/459821_1_En_4_Fig38_HTML.jpg — Figure 4-38
Thanks for finding the body soap , Mei!

This brings this recipe, chapter, and book to a humble end. We do hope this gave you plenty of ideas for things to try out by playing with the Origami framework and bringing light to your creation.

For now, “Hello Goodnight”:

Searching the sky
I swear I see shadows falling
Could be an illusion
A sound of hidden warning
Fame will forever leave me wanting
Wanting
Well it’s alright
I’ve been alright
Hello Hello Goodnight
Mamas Gun “Hello Goodnight”

Table of Contents for 4. Real-Time Video

Create new playlist

Sign In

Sign Up

4. Real-Time Video

4.1 Getting Started with Video Streaming

Problem

Solution

How it works

Do-It-Yourself Video Stream

One-Function Webcam

Transformation Function

Two Frames, or More, from the Same Input Source

4.2 Combining Multiple Video Streams

Problem

Solution

How it works

4.3 Warping Video

Problem

Solution

How it works

4.4 Using Face Recognition

Problem

Solution

How it works

4.5 Diffing with a Base Image

Problem

Solution

How it works

4.6 Finding Movement

Problem

Solution

How it works

Finding Movement in Black and White

Find and Draw Contours

4.7 Separating the Foreground from the Background Using Grabcut

Problem

Solution

How it works

On a Still Image

On a Video Stream

4.8 Finding an Orange in Real Time

Problem

Solution

How it works

Using Hough-Circles

Using Find-Contours

4.9 Finding an Image Within the Video Stream

Problem

Solution

How it works

Still Image

Video Stream

Table of Contents for
4. Real-Time Video