Inference on the edge

For those unfamiliar with the term edge computing, it simply refers to computation performed at the end, or edge, of a network as opposed to sending it to a central server for computation. Some examples of edge devices include cars, robots, Internet of Things (IoT), and, of course, smartphones.

The motivation for performing computation at the edge, where the data resides, is that sending data across the network is expensive and time-consuming; this incurred latency and cost restrict us with what experiences we can deliver to the user. Removing these barriers opens up new applications that would otherwise not be possible. Another benefit of performing inference at the edge is data privacy; removing the need of having to transmit personal data across the network reduces the opportunities that a malicious user has for obtaining it.

Luckily, technology advances at an astonishing rate and improvements in hardware and software have now made it feasible to perform inference at the edge.

As this book's focus is on applied ML on iOS; detailed model architectures and training have been intentionally omitted as training currently requires significant computational power that is still out of reach of most of today's edge devices - although this is likely to change in the near future as edge devices become increasingly powerful, with the most likely next advancement being around tuning and personalizing models using personal data that resides on the device.

Some common use cases for ML on the device include:

Speech recognition: It's currently common to perform wake (or hot) word detection locally rather than continuously streaming data across the network. For example, Hey Siri is most likely performed locally on the device, and once detected, it streams the utterance to a server for further processing.
Image recognition: It can be useful for the device to be able to understand what it is seeing in order to assist the user in taking a photo, such as applying the appropriate filters, adding captions to the photos to make them easier to find and grouping similar images together. These enhancements may not be significant enough to justify opening a connection to a remote server, but because these can be performed locally, we can use them without worrying about cost, latency, or privacy issues.
Object localization: Sometimes, it is useful to know not only what is present in view, but also where it is in the view. An example of this can be seen in augmented reality (AR) apps, where information is overlaid onto the scene. Having these experiences responsive is critical for their success, and therefore there is a need for extremely low latency in performing inference.
Optical character recognition: One of the first commercial applications of neural networks is still just as useful as it was when it was used in American post offices in 1989. Being able to read allows for applications such as digitizing a physical copy or performing computations on it; examples include language translation or solving a Sudoku puzzle.
Translation: Translating from one language to another quickly and accurately, even if you don't have a network connection, is an important use case and complements many of the visual-based scenarios we have discussed so far, such as AR and optical character recognition.
Gesture recognition: Gesture recognition provides us with a rich interaction mode, allowing quick shortcuts and intuitive user interactions that can improve and enhance user experience.
Text prediction: Being able to predict the next word the user is going to type, or even predicting the user's response, has turned something fairly cumbersome and painful to use (the smartphone soft keyboard) into something that is just as quick or even quicker than its counterpart (the conventional keyboard). Being able to perform this prediction on the device increases your ability to protect the user's privacy and offer a responsive solution. This is not feasible if the request has to be routed to a remote server.
Text classification: This covers everything from sentiment analysis to topic discovery and facilitates many useful applications, such as providing means to recommend relevant content to the user or eliminate duplicates.

These examples of use cases and applications hopefully show why we may want to perform inference on the edge; it means you can offer a higher level of interactivity than what could be possible with performing inference off the device. It allows you to deliver an experience even if the device has poor network connectivity or no network connectivity. And finally, it's scalable—an increase in demand doesn't directly correlate to the load on your server.

So far, we have introduced inference and the importance of being able to perform it on the edge. In the next section, we will introduce the framework that facilitates this on iOS devices: Core ML.

Table of Contents for Inference on the edge

Create new playlist

Sign In

Sign Up

Table of Contents for
Inference on the edge