Chapter 14. Designing Your Own TinyML Applications

So far, we’ve explored existing reference applications for important areas like audio, image, and gesture recognition. If your problem is similar to one of the examples, you should be able to adapt the training and deployment process—but what if it isn’t obvious how to modify one of our examples to fit? In this and the following chapters, we cover the process of building an embedded machine learning solution for a problem for which you don’t have an easy starting point. Your experience with the examples will serve as a good foundation for creating your own systems, but you also need to learn more about designing, training, and deploying new models. Because the constraints of our platforms are so tight, we also spend a lot of time discussing how you can make the right optimizations to fit within your storage and computational budgets without missing your accuracy targets. You’ll undoubtedly spend a lot of your time trying to understand why things aren’t working, so we cover a variety of debugging techniques. Finally, we explore how you can build in safeguards for your users’ privacy and security.

The Design Process

Training models can take days or weeks, and bringing up a new embedded hardware platform can also be very time-consuming—so one of the biggest risks to any embedded machine learning project is running out of time before you have something working. The most effective way to reduce this risk is by answering as many of the outstanding questions as early in the process as possible, through planning, research, and experimentation. Each change to your training data or architecture can easily involve a week of coding and retraining, and deployment hardware changes have a ripple effect throughout your software stack, involving a lot of rewriting of previously working code. Anything you can do at the outset to reduce the number of changes required later in the development process can save you the time you would have spent making those changes. This chapter focuses on some of the techniques we recommend for answering important questions before you start coding the final application.

Do You Need a Microcontroller, or Would a Larger Device Work?

The first question you really need to answer is whether you need the advantages of an embedded system or can relax your requirements for battery life, cost, and size, at least for an initial prototype. Programming on a system with a complete modern OS like Linux is a lot easier (and faster) than developing in the embedded world. You can get complete desktop-level systems like a Raspberry Pi for under $25, along with a lot of peripherals like cameras and other sensors. If you need to run compute-heavy neural networks, NVIDIA’s Jetson series of boards start at $99 and bring a strong software stack in a small form factor. The biggest downsides to these devices are that they will burn several watts, giving them battery-powered lifetimes on the order of hours or days at most, depending on the physical size of the energy storage. As long as latency isn’t a hard constraint, you can even fire up as many powerful cloud servers as you need to handle the neural network workload, leaving the client device to handle the interface and network communications.

We’re strong believers in the power of being able to deploy anywhere, but if you’re trying to determine whether an idea will work at all, we highly recommend trying to prototype using a device that is easy and quick to experiment with. Developing embedded systems is a massive pain in the behind, so the more you can tease out the real requirements of your application before you dive in, the more chance you have of being successful.

Picking a practical example, imagine that you want to build a device to help monitor the health of sheep. The final product will need to be able to run for weeks or months in an environment without good connectivity, so it must be an embedded system. When you’re getting underway, however, you don’t want to use such a tricky-to-program device, because you won’t yet know crucial details like what models you want to run, which sensors are required, or what actions you need to take based on the data you gather, and you won’t yet have any training data. To bootstrap your work, you’ll probably want to find a friendly farmer with a small flock of sheep that graze somewhere accessible. You could put together a Raspberry Pi platform that you remove from each monitored sheep every night yourself to recharge, and set up an outdoor WiFi network that covers the range of the grazing field so that the devices can easily communicate with a network. Obviously you can’t expect real customers to go to this sort of trouble, but you’ll be able to answer a lot of questions about what you need to build with this setup, and experimenting with new models, sensors, and form factors will be much faster than in an embedded version.

Microcontrollers are useful because they scale up in a way no other hardware can. They are cheap, small, and able to run on almost no energy, but these advantages only kick in when you actually need to scale. If you can, put off dealing with scaling until you absolutely must so that you can be confident that you’re scaling the right thing.

Understanding What’s Possible

It’s difficult to know what problems deep learning is able to solve. One rule of thumb we’ve found very useful is that neural network models are great at the kind of tasks that people can solve “in the blink of an eye.” We intuitively seem able to recognize objects, sounds, words, and friends in a comparative instant, and these are the same kinds of tasks that neural networks can perform. Similarly, DeepMind’s Go-solving algorithm relies on a convolutional neural network that’s able to look at a board and return an estimate of how strong a position each player is in. The longer-term planning parts of that system are then built up using those foundational components.

This is a useful distinction because it draws a line between different kinds of “intelligence.” Neural networks are not automatically capable of planning or higher-level tasks like theorem solving. They’re much better at taking in large amounts of noisy and confusing data, and spotting patterns robustly. For example, a neural network might not be a good solution for guiding a sheepdog in how to herd a flock through a gate, but it could well be the best approach for taking in a variety of sensor data like body temperature, pulse, and accelerometer readings to predict whether a sheep is feeling unwell. The sorts of judgments that we’re able to perform almost unconsciously are more likely to be covered by deep learning than problems that require explicit thinking this doesn’t mean that those more abstract problems can’t be helped by neural networks, just that they’re usually only a component of a larger system that uses their “instinctual” predictions as inputs.

Follow in Someone Else’s Footsteps

In the research world, “reviewing the literature” is the rather grand name for reading research papers and other publications related to a problem you’re interested in. Even if you’re not a researcher this can be a useful process when dealing with deep learning because there are a lot of useful accounts of attempts to apply neural network models to all sorts of challenges, and you’ll save a lot of time if you can get some hints on how to get started from the work of others. Understanding research papers can be challenging, but the most useful things to glean are what kinds of models people use for problems similar to yours and whether there are any existing datasets you can use, given that gathering data is one of the most difficult parts of the machine learning process.

For example, if you were interested in predictive maintenance on mechanical bearings, you might search for “deep learning predictive maintenance bearings” on arxiv.org, which is the most popular online host for machine learning research papers. The top result as of this writing is a survey paper from 2019, “Machine Learning and Deep Learning Algorithms for Bearing Fault Diagnostics: A Comprehensive Review” by Shen Zhang et al. From this, you’ll learn that there’s a standard public dataset of labeled bearing sensor data called the Case Western Reserve University bearing dataset. Having an existing dataset is extremely helpful because it will assist you in experimenting with approaches even before you have gathered readings from your own setup. There’s also a good overview of the different kinds of model architectures that have been used on the problem, along with discussions of their benefits, costs, and the overall results they achieve.

Find Some Similar Models to Train

After you have some ideas about model architectures and training data to use, it’s worth spending some time in a training environment experimenting to see what results you can achieve with no resource constraints. This book focuses on TensorFlow, so we’d recommend that you find an example TensorFlow tutorial or script (depending on your level of experience), get it running as is, and then begin to adapt it to your problem. If you can, look at the training examples in this book for inspiration because they also include all of the steps needed to deploy to an embedded platform.

A good way to think about what models might work is looking at the characteristics of your sensor data and trying to match them to something similar in the tutorials. For example, if you have single-channel vibration data from a wheel bearing, that’s going to be a comparatively high-frequency time series, which has a lot in common with audio data from a microphone. As a starting point, you could try converting all of your bearing data into .wav format and then feed it into the speech training process instead of the standard Speech Commands dataset, with the appropriate labels. You’d probably then want to customize the process a lot more, but hopefully you’d at least get a model that was somewhat predictive and be able to use that as a baseline for further experiments. A similar process could apply to adapting the gesture tutorial to any accelerometer-based classification problem, or retraining the person detector for different machine vision applications. If there isn’t an obvious example to start with in this book, searching for tutorials that show how to build the model architecture you’re interested in using Keras is a good way to get started.

Look at the Data

Most of the focus of machine learning research is on designing new architectures; there’s not much coverage of training datasets. This is because in the academic world you’re usually given a pregenerated training dataset that is fixed, and you’re competing on how well your model can score on it compared to others. Outside of research we usually don’t have an existing dataset for our problem, and what we care about is the experience we deliver to the end user, not the score on a fixed dataset, so our priorities become very different.

One of the authors has written a blog post that covers this in more detail, but the summary is that you should expect to spend much more time gathering, exploring, labeling, and improving your data than you do on your model architecture. The return on the time you invest will be much higher.

There are some common techniques that we’ve found to be very useful when working with data. One that sounds extremely obvious but that we still often forget is: look at your data! If you have images, download them into folders arranged by label on your local machine and browse through them. If you’re working with audio files, do the same and listen to a selection of them. You’ll quickly discover all sorts of oddities and mistakes that you didn’t expect, from Jaguar cars labeled as jaguar cats to recordings in which the audio is too faint or has been cropped and cuts off part of a word. Even if you just have numerical data, looking through the numbers in a comma-separated values (CSV) text file can be extremely helpful. In the past we’ve spotted problems like many of the values reaching the saturation limits of sensors and maxing out, or even wrapping around, or the sensitivity being too low so that most of the data is crammed into too small a numerical range. You can get much more advanced in your data analysis, and you’ll find tools like TensorBoard extremely helpful for clustering and other visualizations of what’s happening in your dataset.

Another problem to watch out for is an unbalanced training set. If you are classifying into categories, the frequency at which different classes occur in your training inputs will affect the eventual prediction probabilities. One trap that’s easy to fall into is thinking that the results from your network represent true probabilities—for example, a 0.5 score for “yes” meaning that the network is predicting there’s a 50% chance the spoken word was “yes.” In fact the relationship is a lot more complex, given that the ratio of each class in the training data will control the output values, but the prior probability of each class in the application’s real input distribution is needed to understand the real probability. As another example, imagine training a bird image classifier on 10 different species. If you then deployed that in the Antarctic, you’d be very suspicious of a result that indicated you’d seen a parrot; if you were looking at video from the Amazon, a penguin would be equally surprising. It can be challenging to bake this kind of domain knowledge into the training process because you typically want roughly equal numbers of samples for each class so the network “pays attention” equally to each. Instead, there’s typically a calibration process that occurs after the model inference has been run, to weight the results based on prior knowledge. In the Antarctic example, you might have a very high threshold before you report a parrot, but a much lower one for penguins.

Wizard of Oz-ing

One of our favorite machine learning design techniques doesn’t involve much technology at all. The most difficult problem in engineering is determining what the requirements are, and it’s very easy to spend a lot of time and resources on something that doesn’t actually work well in practice for a problem, especially because the process of developing a machine learning model takes a long time. To flush out the requirements, we highly recommend the Wizard of Oz approach. In this scenario, you create a mock-up of the system you eventually want to build, but instead of having software do the decision making, you have a person as “the man behind the curtain.” This lets you test your assumptions before you go through a time-consuming development cycle to make sure you have the specifications well tested before you bake them into your design.

How does this work in practice? Imagine that you’re designing a sensor that will detect when people are present in a meeting room, and if there’s no one in the room, it will dim the lights. Instead of building and deploying a wireless microcontroller running a person detection model, with the Wizard of Oz approach you’d create a prototype that just fed live video to a person sitting in a nearby room with a switch that controlled the lights and instructions to dim them when nobody was visible. You’d quickly discover usability issues, like if the camera doesn’t cover the entire room and so the lights keep getting turned off when somebody’s still present, or if there’s an unacceptable delay in turning them on when someone enters the room. You can apply this approach to almost any problem, and it will give you precious validation of the assumptions you’re making about your product, without you spending time and energy on a machine learning model based on the wrong foundations. Even better, you can set up this process so that you generate labeled data for your training set from it, given that you’ll have the input data along with the decisions that your Wizard made based on those inputs.

Get It Working on the Desktop First

The Wizard of Oz approach is one way to get a prototype running as quickly as possible, but even after you’ve moved on to model training you should be thinking about how to experiment and iterate as quickly as you can. Exporting a model and getting that model running fast enough on an embedded platform can take a long time, so a great shortcut is to stream data from a sensor in the environment to a nearby desktop or cloud machine for processing. This will probably use too much energy to be a deployable solution in production, but as long as you can ensure the latency doesn’t affect the overall experience, it’s a great way to get feedback on how well your machine learning solution works in the context of the whole product design.

Another big benefit is that you can record a stream of sensor data once, and then use it over and over again for informal evaluations of your model. This is especially useful if there are particularly high-impact errors that a model has made in the past that might not be properly captured in the normal metrics. If your photo classifier labels a baby as a dog, you might want to especially avoid this even if you’re overall 95% accurate because it would be so upsetting for the user.

There are a lot of choices for how to run the model on the desktop. The easiest way to begin is by collecting example data using a platform like the Raspberry Pi that has good sensor support, and doing a bulk copy to your desktop machine (or a cloud instance if you prefer). You can then use standard TensorFlow in Python to train and evaluate potential models in an offline way, with no interactivity. When you have a model that seems promising you can take incremental steps, such as converting your TensorFlow model to TensorFlow Lite, but continue evaluating it against batch data on your PC. After that’s working, you could try putting your desktop TensorFlow Lite application behind a simple web API and calling it from a device that has the form factor you’re aiming at to understand how it works in a real environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.115.155