The above examples demonstrate that application-tailored processors can run specific code blocks one or two orders of magnitude faster than fixed-ISA processors without the need for faster clock rates. However, most SOCs also require that power dissipation and energy consumption (power dissipated over time) not rise commensurately with the performance improvement. Here too, configurable processors excel. Figure 14.4 demonstrates the performance and power benefits of application tailoring for three applications: AES (advanced encryption standard) cryptographic coding and decoding, Viterbi decoding, and the FFT. In each case, ISA extensions have been added to an Xtensa processor to improve its performance. As a result of the extensions, throughput improves by factors of more than 20×–100×, depending on the application.
A second result shown by Figure 14.4 is the reduction in energy required to execute the application over the same time period (as measured in terms of battery life). Energy consumption improves by nearly as much as performance. Although the size of each tailored processor is larger than the untailored version, the drop in the required clock rate more than compensates for the additional capacitance incurred by the extra silicon needed for the enhancement logic.
The Diamond processor cores, which are based on Xtensa configurable cores, deliver more performance and low-power operation relative to other fixed-ISA processors, as shown in Figure 14.5, which plots performance and power dissipation for several ARM, MIPS, and Tensilica Diamond processors as reported for the EEMBC suite of benchmarks. Note that each processor family demonstrates a characteristic logarithmic curve linking performance and power dissipation. As performance improves (mostly due to clock-rate increases), power dissipation rises. As clock frequencies increase, performance increases as well, but not as quickly as power dissipation. Hence the logarithmic curves.
As shown in Figure 14.6, application tailoring Xtensa processors for each of the benchmarks produces performance/power numbers in an entirely different league. Figure 14.6 must be drawn to an entirely different scale to accommodate the performance range of the configurable Xtensa processor.
The performance-versus-power-dissipation equation looks even better for configurable processor cores when looking at multiple cores. Figure 14.7 shows the silicon footprints of three ways to get high processor throughput using 90 nm SOC process technology. Figure 14.7a shows an ARM Cortex A8 processor core. The core runs at 800 MHz, consumes about 5 mm2 of silicon, dissipates 800 mW, and delivers about 1600 peak instructions/sec. Figure 14.7b shows the 3-way superscalar Diamond 570T CPU core, which runs at 525 MHz, consumes about 0.6 mm2 of silicon, dissipates about 90 mW, and delivers 1575 peak instructions/sec. Figure 14.7c shows three Xtensa 6 processor cores. Each Xtensa 6 core runs at 525 MHz, the three cores together dissipate about 50 mW and consume about 0.36 mm2 of silicon, and the three processors together deliver 1575 peak instructions/sec.
The approach shown in Figure 14.7a represents a conventional method for achieving performance: use advanced process technology and a deep pipeline to boost the processor’s clock frequency. The second approach applies more modern processor architecture and compiler technologies to achieve equivalent performance at a lower clock rate, with less power dissipation, and a smaller silicon footprint. The approach shown in Figure 14.7c has all of the advantages, however. The three processor cores are independent. Together, they consume the least amount of power of the three alternatives, they use the least amount of silicon, and they can be used independently. When not needed, they can be put to sleep, which greatly reduces dynamic power dissipation.
Figure 14.7 encapsulates many significant reasons why using multiple on-chip processors to increase an SOC’s processing capability is a good idea:
Lower clock rate
Lower power dissipation
Smaller silicon footprint
More options for effectively using the available processing power.
3.14.142.62