Chapter 16. Optimizing Energy Usage

The most important advantage that embedded devices have over desktop or mobile systems is that they consume very little energy. A server CPU might consume tens or hundreds of watts, requiring a cooling system and main power supply to run. Even a phone can consume several watts and require daily charging. Microcontrollers can run at less than a milliwatt, more than a thousand times less than a phone’s CPU, and so run on a coin battery or energy harvesting for weeks, months, or years.

If you’re developing a TinyML product, it’s likely that the most challenging constraint you’ll have to deal with is battery life. Requiring human intervention to change or recharge batteries is often not feasible, so the useful lifetime of your device (how long it will continue working) will be defined by how much energy it uses, and how much it can store. The battery capacity is typically limited by the physical size of your product (for example, a peel-and-stick sensor is unlikely to be able to accommodate anything more than a coin battery), and even if you’re able to use energy harvesting, there are sharp limits on how much power that can supply. This means that the main area you can control to influence the lifetime of your device is how much energy your system uses. In this chapter we talk about how you can investigate what your power usage is and how to improve it.

Developing Intuition

Most desktop engineers have a rough feel for how long different kinds of operations take, and they know that a network request is likely to be slower than reading some data from RAM, and that it will usually be faster to access a file from a solid-state drive (SSD) than a spinning-disk drive. It’s much less common to have to think about how much energy different functionality needs, but in order to build a mental model and plan for power efficiency, you’ll need to have some rules of thumb for what magnitude of energy your operations require.

Note

We switch back and forth in this chapter between measures of energy and power measurements. Power is energy over time, so for example a CPU that uses one joule (J) of energy every second would be using one watt of power. Since what we care most about is the lifetime of our device, it’s often most helpful to focus on average power usage as a metric, because that’s directly proportional to the length of time a device can run on a fixed amount of energy stored in a battery. This means that we can easily predict that a system that uses an average of 1 mW of power will last twice as long as one that uses 2 mW. We will sometimes still refer to energy usage for one-off operations that aren’t sustained for long periods of time.

Typical Component Power Usage

If you want a deep dive into how much energy system components use, Smartphone Energy Consumption by Sasu Tarkoma et al. (Cambridge University Press) is a great book to start with. Here are some numbers we’ve derived from their calculations:

  • An Arm Cortex-A9 CPU can use between 500 and 2,000 mW.

  • A display might use 400 mW.

  • Active cell radio might use 800 mW.

  • Bluetooth might use 100 mW.

Going beyond smartphones, here are the best measurements we’ve observed for embedded components:

  • A microphone sensor might use 300 microwatts (µW).

  • Bluetooth Low Energy might use 40 mW.

  • A 320 × 320-pixel monochrome image sensor (like the Himax HM01B0) could use 1 mW at 30 FPS.

  • An Ambiq Cortex-M4F microcontroller might use 1 mW at 48 MHz clock rate.

  • An accelerometer might use 1 mW.

These numbers will vary a lot depending on the exact components you use, but they’re useful to remember so that you at least know the rough proportions of different operations. One top-level summary is that radio uses a lot more power than other functionality you might need in an embedded product. Additionally, it seems like sensor and processor energy requirements are dropping much faster than communications power, so it’s likely that the gap will increase even more in the future.

Once you have an idea of what the active components in your system are likely to use, you’ll need to think about how much energy you can store or harvest to power them. Here are some rough figures (thanks to James Meyers for the energy harvesting estimates):

  • A CR2032 coin battery might hold 2,500 J. This means that if your system is using one mW of power on average, you could hope to get roughly a month of use.

  • An AA battery might have 15,000 J, giving a six-month lifetime for a 1 mW system.

  • Harvesting temperature differences from an industrial machine could yield 1 to 10 mW per square centimeter.

  • Power from indoor light could give 10 µW per square centimeter.

  • Outdoor light might enable you to harvest 10 mW for each square centimeter.

As you can see, only industrial temperature differentials or outdoor lighting is currently practical for self-powering devices, but as the energy requirements of processors and sensors drop, we hope using other methods will start to be possible. You can follow commercial suppliers like Matrix or e-peas to see some of the latest energy harvesting devices.

Hopefully these ballpark numbers will help you sketch out what kind of system might be practical for your combination of lifetime, cost, and size requirements. They should be enough for at least an initial feasibility check, and if you can internalize them as intuitions, you’ll be able to quickly think through a lot of different potential trade-offs.

Hardware Choice

When you have a rough idea of what kinds of components you might use in your product, you’ll need to look at real parts you can purchase. If you’re looking for something that’s well documented and accessible to hobbyists, it’s good to start by browsing sites like SparkFun’s, Arduino’s, or AdaFruit’s. These offer components that come with tutorials, drivers, and advice on connecting to other parts. They are also the best place to start prototyping, because you might well be able to get a complete system with everything you need already populated. The biggest downsides are that you will have a more limited selection, the integrated systems might not be optimized for overall power usage, and you will be paying a premium for the extra resources.

For more choice and lower prices, but without the valuable support, you can try electronics suppliers like Digi-Key, Mouser Electronics, or even Alibaba. What all of these sites have in common is that they should supply datasheets for all of their products. These contain a wealth of detail about each part: everything from how to supply clock signals to mechanical data on the size of the chip and its pins. The first thing you’ll probably want to understand, though, is the power usage, and this can be surprisingly difficult to find. As an example, look at the datasheet for an STMicroelectronics Cortex-M0 MCU. There are almost a hundred pages, and it’s not obvious from glancing at the table of contents how to find the power usage. One trick we’ve found helpful is to search for “milliamps” or “ma” (with the spaces) within these documents, because they’re often the units that are used to express power usage. In this datasheet that search leads to a table on page 47, shown in Figure 16-1, which provides values for current consumption.

Typical and maximum current consumption from VDD supply at VDD = 3.6 V
Figure 16-1. Current consumption table from STMicroelectronics

This still can be tough to interpret, but what we’re generally interested in is how many watts (or milliwatts) this chip might use. To get that, we need to multiply the amps shown by the voltage, which is listed as 3.6 volts here (we’ve highlighted this at the top of the table). If we do that, we can see that the typical power used ranges from nearly a 100 mW down to only 10 when it’s in sleep mode. This gives us an idea that the MCU is comparatively power-hungry, though its price at 55 cents might compensate for that, depending on your trade-offs. You should be able to perform similar kinds of detective work for the datasheets of all the components you’re interested in using, and assemble a picture of the likely overall power usage based on the sum of all these parts.

Measuring Real Power Usage

Once you have a set of components, you’ll need to assemble them into a complete system. That process is beyond the scope of this book, but we do recommend that you try to get something completed as early as possible in the process so that you can try out the product in the real world and learn more about its requirements. Even if you aren’t using quite the components you want to or don’t have all the software ready, getting early feedback is invaluable.

Another benefit of having a complete system is that you can test the actual power usage. Datasheets and estimates are helpful for planning, but there’s always something that doesn’t fit into a simple model, and integration testing will often show much higher power consumption than you expect.

There are a lot of tools that you can use to measure the power consumption of a system, and knowing how to use a multimeter (a device for measuring various electrical properties) can be very helpful, but the most reliable method is to place a battery with a known capacity in the device and then see how long it lasts. This is what you actually care about, after all, and although you might be aiming for a lifetime of months or years, most likely your first attempts will run for only hours or days. The advantage of this experimental approach is that it captures all the effects you care about, including things like failures when the voltage drops too low, which probably won’t show up in simple modeling calculations. It is also so simple that even a software engineer can manage it!

Estimating Power Usage for a Model

The simplest way to estimate how much power a model will use on a particular device is to measure the latency for running one inference, and then multiply the average power usage of the system for that time period to get the energy usage. At the start of a project you’re not likely to have hard figures for the latency and power usage, but you can come up with ballpark figures. If you know how many arithmetic operations a model requires, and roughly how many operations per second a processor can perform, you can roughly estimate the time that model will take to execute. Datasheets will usually give you numbers for the power usage of a device at a particular frequency and voltage, though beware that they probably won’t include common parts of the whole system like memory or peripherals. It’s worth taking these early estimates with a big pinch of salt and using them as an upper bound on what you might achieve, but at least you can get some idea of the feasibility of your approach.

As an example, if you have a model that takes 60 million operations to execute, like the person detector, and you have a chip like an Arm Cortex-M4 running at 48 MHz, and you believe it can perform two 8-bit multiply/adds per cycle using its DSP extensions, you might guess the maximum latency would be 48,000,000/60,000,000 = 800 ms. If your chip uses 2 mW, that would work out to 1.6 (mJ) per inference.

Improving Power Usage

Now that you know the approximate lifetime of your system, you’ll probably be looking at ways to improve it. You might be able to find hardware modifications that help, including turning off modules that you don’t need or replacing components, but those are beyond what this book will cover. Luckily, there are some common techniques that don’t require electrical engineering knowledge but can help a lot. Because these approaches are software-focused, they do assume that the microcontroller itself is taking the bulk of the power. If sensors or other components in your device are power hogs, you will need to do a hardware investigation.

Duty Cycling

Almost all embedded processors have the ability to put themselves into a sleep mode in which they don’t perform any computation and use very little power, but are able to wake up either after an interval or when a signal comes in from outside. This means that one of the simplest ways of reducing power is to insert sleeps between inference calls, so that the processor spends more time in a low-power mode. This is commonly known as duty cycling in the embedded world. You might worry that this excludes continuous sensor data gathering, but many modern microcontrollers have direct memory access (DMA) capabilities that are able to sample analog-to-digital converters (ADCs) continuously and store the results in memory without any involvement from the main processor.

In a similar way, you might be able to reduce the frequency at which the processor executes instructions so that in effect it runs more slowly, dramatically reducing the power it uses. The datasheet example shown earlier demonstrates how the energy required drops as the clock frequency decreases.

What duty cycling and frequency reduction offer is the ability to trade computation for power usage. What this means in practice is that if you can reduce the latency of your software, you can trade that for a lower power budget. Even if you are able to run within your allotted time, look at ways to optimize latency if you want a reduction in power usage.

Cascading Design

One of the big advantages of machine learning over traditional procedural programming is that it makes it easy to scale up or down the amount of compute and storage resources required, and the accuracy will usually degrade gracefully. It’s more difficult to achieve this with manually coded algorithms, since there aren’t usually obvious parameters that you can tweak to affect these properties. What this means is that you can create what’s known as a cascade of models. Sensor data can be fed into a very small model with minimal compute requirements, and even though it’s not particularly accurate, it can be tuned so that it has a high likelihood of triggering when a particular condition is present (even if it also produces a lot of false positives). If the result indicates that something interesting has just happened, the same inputs can be fed into a more complex model to produce a more accurate result. This process can potentially be repeated for several more stages.

The reason this is useful is that the inaccurate but tiny model can fit into a very power-efficient embedded device, and running it continuously won’t drain much energy. When a potential event is spotted, a more powerful system can be woken up and a larger model run, and so on down the cascade. Because the more powerful systems are operating for only a small fraction of the time, their power usage doesn’t break the budget. This is how always-on voice interfaces work on phones. A DSP is constantly monitoring the microphone, with a model listening for “Alexa,” “Siri,” “Hey Google,” or a similar wake word. The main CPU can be left in a sleep mode, but when the DSP thinks it might have heard the right phrase, it will signal to wake it up. The CPU can then run a much larger and more accurate model to confirm whether it really was the right phrase, and perhaps send the following speech to an even more powerful processor in the cloud if it was.

This means that an embedded product might be able to achieve its goals even if it can’t host a model that’s accurate enough to be actionable by itself. If you are able to train a network that’s able to spot most true positives, and the false positives occur at a low enough frequency, you might be able offload the remaining work to the cloud. Radio is very power-hungry, but if you’re able to limit its use to rare occasions and for short periods, it might fit in your energy budget.

Wrapping Up

For many of us (your authors included), optimizing for energy consumption is an unfamiliar process. Luckily, a lot of the skills we covered for latency optimization also apply here, just with different metrics to monitor. It’s generally a good idea to focus on latency optimizations before energy, because you’ll often need to validate that your product works using a version that gives the short-term user experience you want, even if its lifetime isn’t long enough to be useful in the real world. In the same way, it often makes sense to tackle the subject of Chapter 17, space optimization, after latency and energy. In practice you’re likely to iterate back and forth between all the different trade-offs to meet your constraints, but size is often easiest to work on after the other aspects are fairly stable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.144.59