Person detection with a TensorFlow model

We can't detect an object without first training a model. Training a model requires significant resources in order to perform the calculations to set the weights and neural biases. This is usually done using backpropagation, but it depends on the techniques that are being used. The problem that is encountered by many embedded engineers looking to use machine learning is that once they train their model, they need to convert that model to something that can run within a resource constrained environment.

Working from within an embedded environment often limits the number of neural layers that can be included in a model. Models that are generated using popular tools such as Caffe or TensorFlow also generate their models in floating point. As you know, floating-point calculations are notoriously slow and cumbersome within a microcontroller environment. For this reason, once a model is trained, it needs to be quantized and optimized in order to move to fixed point mathematics and reduce the model size. This is often done using scripts that are provided by Arm to convert a model for use with TFLu and CMSIS-NN. Thankfully, we don't have to develop those scripts ourselves.

A very useful blog that you can read through on the process can be found at https://community.arm.com/innovation/b/blog/posts/low-power-deep-learning-on-openmv-cam.

What's great about this blog is that it even explains how you can convert a Caffe model specifically for use on the OpenMV module! All the steps are necessary to train and deploy a model using Caffe.

You may be wondering though, what about TensorFlow? TensorFlow is too resource-heavy to be used with a microcontroller directly; instead, TensorFlow Lite (TF Lite) could be used. TF Lite is an open source deep learning framework for on-device inferences. TF Lite for MCUs is an experimental port of TensorFlow Lite that is designed to run inferences on microcontrollers with only a few kilobytes of memory! For readers who are interested, you can find the port at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro.

The process to deploy a TF Lite model onto an embedded target is straightforward. You can see the general process in the following diagram:

TF Lite for MCUs has also been integrated within MicroPython through the OpenMV project! You can check out the details at http://docs.openmv.io/library/omv.tf.html. For the most part, this integration is completely seamless for a developer and it is just useful for a developer to understand what is happening behind the scenes even though they don't have to do any of the integration themselves.

Just like before, we are going to leverage an existing model that was trained by OpenMV using TensorFlow. In this example, we are going to look at how we can detect that a person is present in the image. So, the object that we are trying to detect is a person (or at least resembles a person). Perform the following steps to prepare the system:

Connect your OpenMV camera to the computer.
Launch OpenMV IDE.
Load the person detection example by clicking the following: File | Examples | 25-Machine-Learning | tf_person_detection_search_just_center.py.

The development window will now be filled with the example MicroPython script. Take a few minutes to read through the script. You can also find it listed as follows (script source: OpenMV tf_person_detection_search_just_center.py):

# TensorFlow Lite Person Detection Example
#
# Google's Person Detection Model detects if a person is in view.
#
# In this example we slide the detector window over the image and get a
# list
# of activations. Note that use a CNN with a sliding window is extremely
# compute
# expensive so for an exhaustive search do not expect the CNN to be real
# -time.
import sensor, image, time, os, tf
sensor.reset()                         # Reset and initialize the sensor.
sensor.set_pixformat(sensor.GRAYSCALE) # Set pixel format to RGB565 (or 
                                       # GRAYSCALE)
sensor.set_framesize(sensor.QVGA)      # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240))       # Set 240x240 window.
sensor.skip_frames(time=2000)          # Let the camera adjust.
# Load the built-in person detection network (the network is in your OpenMV
# Cam's firmware).
net = tf.load('person_detection')
labels = ['unsure', 'person', 'no_person']
clock = time.clock()
while(True):
    clock.tick()
    img = sensor.snapshot()
    # net.classify() will run the network on an roi in the image (or on the
    # whole 
    # image
    # if the roi is not
    # specified). A classification score output vector will be generated
    # for each location. At each scale the
    # detection window is moved around in the ROI using x_overlap (0-1) and 
    # y_overlap (0-1) as a guide.
    # If you set the overlap to 0.5 then each detection window will overlap
    # the 
    # previous one by 50%. Note
    # the computational work load goes WAY up the more overlap. Finally,
    # for 
    # multi-scale matching after
    # sliding the network around in the x/y dimensions the detection window
    # will 
    # shrink by scale_mul (0-1)
    # down to min_scale (0-1). For example, if scale_mul is 0.5 the
    # detection 
    # window will shrink by 50%.
    # Note that at a lower scale there's even more area to search if
    # x_overlap 
    # and y_overlap are small...
    # Setting x_overlap=-1 forces the window to stay centered in the ROI in
    # the x direction always. If
    # y_overlap is not -1 the method will search in all vertical positions.
    # Setting y_overlap=-1 forces the window to stay centered in the ROI in
    # the y direction always. If
    # x_overlap is not -1 the method will search in all horizontal
    # positions.
    # default settings just do one detection... change them to search the 
    # image...
    for obj in net.classify(img, min_scale=0.5, scale_mul=0.5,
     x_overlap=-1, y_overlap=-1):
        print("**********
Detections at [x=%d,y=%d,w=%d,h=%d]" %
        obj.rect())
        for i in range(len(obj.output())):
            print("%s = %f" % (labels[i], obj.output()[i]))
        img.draw_rectangle(obj.rect())
        img.draw_string(obj.x()+3, obj.y()-1, labels[obj.output().
        index(max(obj.output()))], mono_space = False)
    print(clock.fps(), "fps")

Now that you have an idea of how the script works, let's run it! Perform the following steps:

Click the connect button in the lower-left corner of the OpenMV IDE.
Click Run.

Make sure that your serial terminal is open. If it does not display, click Serial Terminal in the lower-left corner.
Now, present a person to the camera and notice the confidence level in the terminal that there is a person present.

When I ran the example, I decided to present to it not my face, but instead my Dr. Leonard McCoy Star Trek action figure (the one played by Karl Urban). You can see that I presented the action figure to the OpenMV camera in the person box that is generated in the center of the view in the following screenshot:

When I presented the action figure, the image was pushed through the person detection inference that is running in the example MicroPython script. The serial terminal output can be seen in the following screenshot:

As you can see, the application tells us the framerate, which in this case is typically between 1–2 frames per second. You can see that it also calculates whether it thinks there is a person in the image. In this case, you can see it is ~95% sure that there is a person there. It also evaluates whether it thinks there is no person there and how unsure it is about its answers.

Using machine learning to detect objects can be that simple! If you can, find an existing model that you can leverage for application. If a model doesn't exist, then you need to train a model yourself.

Table of Contents for Person detection with a TensorFlow model

Create new playlist

Sign In

Sign Up

Table of Contents for
Person detection with a TensorFlow model