Introducing MobileNet

At this point, we need to take a small detour into the world of CNNs. MobileNet is a CNN model, so having a little understanding of what a CNN is helps us to understand how it relates to the problem we are solving here. Don't worry, we aren't going to dig into the mathematics behind CNNs, but knowing a little bit about what they do will help us appreciate what they bring to the table for us.

CNN classifiers work by taking in an input image (potentially from a video stream), processing the image, and classifying it under predefined categories. In order to understand how they work, we need to take a step back and think about the problem from a computer's point of view. Suppose we have a picture of a horse. To a computer, that picture is just a series of pixels, so if we were to show a picture of a slightly different horse, the computer cannot say that they match just by comparing pixels.

A CNN breaks the images down into pieces (say, into 3x3 grids of pixels) and compares those pieces. Simplistically speaking, what it is looking for is the number of matches that it can make with these pieces. The greater the number of matches, the greater the confidence that we have a match. This is a very simplified description of what a CNN does, which involves a number of steps and filters, but it should serve to provide an understanding of why we would want to use a CNN such as MobileNet in TensorFlow.

MobileNet is a specialist CNN that, among other things, provides us with image classification that has been trained against images in the ImageNet database (http://www.image-net.org/). When we load the model, we are loading a pre-trained model that has been created for us. The reason we are using a pre-trained network is that it has been trained on a large dataset on the server. We would not want to run image classification training in the browser because it would require too much load being carried over from the server to the browser in order to perform the training. So, no matter how powerful your client PC is, copying over the training datasets will be too much.

We have mentioned MobileNetV1 and MobileNetV2 without going into what they are and what datasets they were trained on. Basically, the MobileNet models were developed by Google and trained on the ImageNet dataset, a dataset containing 1.4 million images broken down into 1,000 classes of images. The reason these are called MobileNet models is because they were trained with mobile devices in mind, so they are designed to run on low power and/or low storage devices.

With a pre-trained model, we can use it as it is or we could customize it to use it for transfer learning.

Table of Contents for Introducing MobileNet

Create new playlist

Sign In

Sign Up

Table of Contents for
Introducing MobileNet