Project setup

Our first stop will be looking at how we can make sense of the world, or at least, programmatically be aware of what the user is looking at. Computer vision, specifically recognition, has made leaps and bounds since 2012, when computer scientists Geoffrey Hinton, Alex Krizhevsky, and Ilya Sutskever entered the ILSVRC 2012 computer vision competition using ideas from http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf, a paper they had recently published. Being the only ones using a Convolutional Neural Network (CNN), they entered the competition; the rest is pretty much history. Models these days can compete with humans in recognizing objects in images.

CNN is a type of neural network well suited for images due to its properties of preserving the relationship between pixels in close proximity. 

Fortunately for us, many companies have made Application Program Interfaces (APIs) available that offer similar capabilities for computer vision, including Microsoft. We will be using Microsoft's Cognitive Services Computer Vision API for this example. The API gives our application the ability to recognize objects in a image, or in our case, in front of the user. We will use this to track what items the user comes across. Before jumping into code, let's see how this API works and what capabilities it provides for us.

The easiest way to do this is to go to their service webpage https://azure.microsoft.com/en-gb/services/cognitive-services/computer-vision/ and explore some of the examples presented on this page. The following is one such example showing the image and associated metadata:

Screenshot of the computer vision Microsoft offers as ones of its cognitive services, Source: https://azure.microsoft.com/en-gb/services/cognitive-services/computer-vision

In the preceding image, you can see the service returning a set of Tags and Captions. This is what we will assign, each frame (captured image) we capture and we will use this data when searching for an item for the user.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.68.14