Data to drive the desired effect – action shots

Now would be a good time to introduce the photo effect we want to create in this chapter. The effect, as I know it, is called an action shot. It's essentially a still photograph that shows someone (or something) in motion, probably best illustrated with an image - like the one shown here: 

 
As previously mentioned, the model we used in this chapter performs binary (or single-class) classification. This simplification, using a binary classifier instead of a multi-class classifier, has been driven by the intended use that is just segmenting people from the background. Similar to any software project, you should strive for simplicity where you can.

To extract people, we need a model to learn how to recognize people and their associated pixels. For this, we need a dataset consisting of images of people and corresponding images with those pixels of the persons labeled—and lots of them. Unlike datasets for classification, datasets for object segmentation are not so common nor as vast. This is understandable given the additional effort that would be required to label such a dataset. Some common datasets for object segmentation, and ones that are considered for this chapter, include:

  • PASCAL VOC: A dataset with 9,993 labeled images across 20 classes. You can find the dataset at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html.
  • Labeled Faces in the Wild (LFW) from University of Massachusetts Amherst: A dataset comprising 2,927 faces. Each has the hair, skin, and background labeled (three classes). You can find the dataset at http://vis-www.cs.umass.edu/lfw/part_labels/.
  • Common Objects in Context (COCO) dataset: A popular dataset for all things related to computer vision, including segmentation. Its segmented datasets comprise approximately 200,000 labeled images across 80 classes. It's the dataset that was used and which we will be briefly exploring in this section. You can find the dataset at http://cocodataset.org/#home
  • Not considered for this project but good to be aware of is the Cambridge-driving Labeled Video Database (CamVid) from Cambridge University. As is clear from the name, the dataset is made up of frames from a video feed from a car camera—ideal for anyone interested in training their own self-driving car. You can find the dataset at http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/.
Listing the datasets here is possibly superfluous, but semantic segmentation is such an exciting opportunity with huge potential that I hope listing these here will encourage you to explore and experiment with new applications of it.

Luckily for us, COCO's 13+ GB dataset contains many labeled images of people and a convenient API to make finding relevant images easy. For this chapter, COCO's API was used to find all images including people. Then, these were filtered further, only keeping those that contained either one or two people and whose area covered between 20% and 70% of the image, discarding those images where the person was too small or too large. For each of these images, the contours of each of the persons were fetched and then used to create a binary mask, which then became our labels for our training. The following figure illustrates this process for a single image:

Source: The COCO dataset (http://cocodataset.org)

After training on 8,689 images over 40 epochs, an Intersection over Union (IoU) coefficient (also known as the dice coefficientof 0.8192 was achieved on the validation data (approximately 300).

Hopefully, IoU sounds familiar as it was what we used back in Chapter 5, Locating Objects in the World. As a reminder, IoU is an evaluation metric used to measure how well two bounding boxes overlap each other. A perfect overlap, where both bounding boxes overlap each other perfectly, would return 1.0 (which is why the loss is negated for training).

In the following image, we get to see what this looks like, starting with random examples from the validation set. Then, some are manually searched for, like the ones that portray actions:

Source: The COCO dataset (http://cocodataset.org)

And here are some examples of action images where the model was able to sufficiently segment the person from the image:

Finally, here are some, out of many, examples of action images where the model was less successful:

We have covered the model and training data and examined the outputs of the model. It's now time to turn our attention to the application in this chapter, which we will begin working on in the next section. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.156.93