7.6. Low-Power System Design and Operation

As gate counts soar into the hundreds of millions per chip, SOC power dissipation becomes an increasingly important design consideration for all systems. There are many ways to reduce system power. Some of these power-reducing methods relate to the processor’s design and are in the domain of the processor designer. Other methods are in the system designer’s domain.

A processor designer can reduce a processor’s power dissipation by minimizing clock activity inside of the processor through two techniques. The first technique is clock gating—switching off the clock to circuits that are not being exercised on a cycle-by-cycle basis. All Xtensa and Diamond processor cores including the Diamond 108Mini core employ extensive functional clock gating to minimize dynamic power dissipation within the processor.

Functional clock gating creates many branches in the processor’s clock tree. Sometimes, the number of clock-tree branches ranges into the hundreds. Creating such complex clock trees is the purview of automated tools because the complexity of the task easily outstrips the capacity of the human mind. Because processors are instruction-driven, it is relatively easy to determine which parts of a processor need to operate, and which do not, during each phase of each instruction’s passage through the processor’s pipeline.

Contrast this situation with creating similarly complex clock gating in a block of custom-designed RTL. Only by setting up exhaustive test-input conditions can all of the functional clock domains be mapped. For large, complex logic blocks, it’s very nearly impossible to create such detailed maps of clock domains. For processors, such analysis is straightforward and in the case of Xtensa and Diamond processor cores, the work of inserting the clock-gating logic is done in a fully automatic fashion.

The second path the processor designer can take to low-power processor operation is to design a processor that performs tasks in fewer clock cycles. Lower clock-rate requirements earn two benefits in SOC design. First, dynamic power dissipation is directly proportional to clock rate, so processors operating at lower clock rates will inherently draw less power. Second, processors operating at lower clock rates can be operated at lower power-supply voltages, which will further reduce dynamic power dissipation and, as a side benefit, also reduces static power dissipation. Unlike personal computer processors—which have historically raced to the highest possible operating frequency—processors used in SOCs should be run as slow as the application will allow, which minimizes power dissipation.

Configurable Xtensa processor cores allow the SOC designer to create new instructions that cut the number of instructions required to execute an algorithm. All Diamond processor cores incorporate special instructions and other features to achieve the same end for certain operations. As discussed above, the Diamond 108Mini processor core incorporates local-memory interfaces that reduce the number of clock cycles needed to communicate with memory. In addition, it takes more power to drive a global bus because of the increased capacitance, so communicating with local memory inherently takes less power than communicating with memory over a global bus. In addition, the Diamond 108Mini processor core incorporates instructions for the 32-bit input and output ports. These instructions reduce the number of cycles needed to perform I/O.

The discussion of the system illustrated in Figure 7.5 has already mentioned the Run/Stall input to the Diamond 108Mini processor. This input pin can shut off nearly all of the clock trees inside of the processor. All of the Xtensa and Diamond processor cores also provide an instruction that essentially halts all processor activity. This instruction, called WAITI (wait for interrupt), puts the processor in a mode where most of the clocks inside of the processor are gated off. An interrupt brings the processor out of the WAITI mode. Execution of the WAITI instruction actually shuts off more of the processor’s internal clocks than the Run/Stall input.

When a Diamond 108Mini slave processor enters the WAITI mode, it asserts its PWaitMode status output. In Figure 7.5, the slave processors’ PWaitMode outputs are connected to the master processor’s 32-bit input port so that the master processor can evaluate the running status of the slave processors. The Diamond 108Mini processor core’s 32-bit input port makes this a glueless connection. The master processor need not consume bus bandwidth to poll the status of the other processors. Bus bandwidth can be reserved for moving instructions and data around the system.

Systems that incorporate multiple processors, such as the 4-processor system shown in Figure 7.5, can use the WAITI instruction to programmably shut down processors when they have completed a task or when their processing bandwidth is not needed. An interrupt quickly activates the processor when conditions warrant. Significantly, especially for the system illustrated in Figure 7.5, inbound-PIF operations can continue to occur after an Xtensa or Diamond processor core enters the WAITI mode, so a master processor can retrieve processed data from a waiting slave processor’s data memory, fill the waiting slave processor’s data memory with input data to be processed, or reprogram a waiting slave processor’s local instruction memory, and then activate the slave processor with an interrupt to initiate further processing.

This sort of reserve capacity—the ability to quickly bring dormant, powered-down computing resources on line at will—opens entirely new architectural vistas to the SOC designer. Chapter 15 has more to say about these advanced system-design topics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.96.102