14.9. Alone, Faster is Not Necessarily Better

The above examples demonstrate that application-tailored processors can run specific code blocks one or two orders of magnitude faster than fixed-ISA processors without the need for faster clock rates. However, most SOCs also require that power dissipation and energy consumption (power dissipated over time) not rise commensurately with the performance improvement. Here too, configurable processors excel. Figure 14.4 demonstrates the performance and power benefits of application tailoring for three applications: AES (advanced encryption standard) cryptographic coding and decoding, Viterbi decoding, and the FFT. In each case, ISA extensions have been added to an Xtensa processor to improve its performance. As a result of the extensions, throughput improves by factors of more than 20×–100×, depending on the application.

A second result shown by Figure 14.4 is the reduction in energy required to execute the application over the same time period (as measured in terms of battery life). Energy consumption improves by nearly as much as performance. Although the size of each tailored processor is larger than the untailored version, the drop in the required clock rate more than compensates for the additional capacitance incurred by the extra silicon needed for the enhancement logic.

The Diamond processor cores, which are based on Xtensa configurable cores, deliver more performance and low-power operation relative to other fixed-ISA processors, as shown in Figure 14.5, which plots performance and power dissipation for several ARM, MIPS, and Tensilica Diamond processors as reported for the EEMBC suite of benchmarks. Note that each processor family demonstrates a characteristic logarithmic curve linking performance and power dissipation. As performance improves (mostly due to clock-rate increases), power dissipation rises. As clock frequencies increase, performance increases as well, but not as quickly as power dissipation. Hence the logarithmic curves.

Figure 14.5. Performance/power numbers for three fixed-ISA processor families. Performance on EEMBC benchmarks aggregate for Consumer, Telecom, Office, Network, based on ARM 1136J-S (Freescale i.MX31), ARM 1026EJ-S, Tensilica Diamond 570T, Xtensa V, and Xtensa 3, MIPS 20K, (NECVR5000). MIPS M4K, MIPS 4Ke, MIPS 4Ks, MIPS 24K, ARM 968E-S, ARM 966E-S, ARM 926EJ-S, and ARM7TDMI-S scaled by ratio of Dhrystone MIPS within architecture family. All power figures from vendor websites, 2/23/2006.


As shown in Figure 14.6, application tailoring Xtensa processors for each of the benchmarks produces performance/power numbers in an entirely different league. Figure 14.6 must be drawn to an entirely different scale to accommodate the performance range of the configurable Xtensa processor.

Figure 14.6. Performance/power numbers for three fixed-ISA processor families and application-tailored Xtensa processor cores. Performance on EEMBC benchmarks aggregate for Consumer, Telecom, Office, Network, based on ARM 1136J-S (Freescale i.MX31), ARM 1026EJ-S, Tensilica Diamond 570T, Xtensa V, and Xtensa 3, MIPS 20K, (NECVR5000). MIPS M4K, MIPS 4Ke, MIPS 4Ks, MIPS 24K, ARM 968E-S, ARM 966E-S, ARM 926EJ-S, and ARM7TDMI-S scaled by ratio of Dhrystone MIPS within architecture family. All power figures from vendor websites, 2/23/2006.


The performance-versus-power-dissipation equation looks even better for configurable processor cores when looking at multiple cores. Figure 14.7 shows the silicon footprints of three ways to get high processor throughput using 90 nm SOC process technology. Figure 14.7a shows an ARM Cortex A8 processor core. The core runs at 800 MHz, consumes about 5 mm2 of silicon, dissipates 800 mW, and delivers about 1600 peak instructions/sec. Figure 14.7b shows the 3-way superscalar Diamond 570T CPU core, which runs at 525 MHz, consumes about 0.6 mm2 of silicon, dissipates about 90 mW, and delivers 1575 peak instructions/sec. Figure 14.7c shows three Xtensa 6 processor cores. Each Xtensa 6 core runs at 525 MHz, the three cores together dissipate about 50 mW and consume about 0.36 mm2 of silicon, and the three processors together deliver 1575 peak instructions/sec.

Figure 14.7. Three ways to get 1600 peak instructions/second from 90 nm SOC process technology.


The approach shown in Figure 14.7a represents a conventional method for achieving performance: use advanced process technology and a deep pipeline to boost the processor’s clock frequency. The second approach applies more modern processor architecture and compiler technologies to achieve equivalent performance at a lower clock rate, with less power dissipation, and a smaller silicon footprint. The approach shown in Figure 14.7c has all of the advantages, however. The three processor cores are independent. Together, they consume the least amount of power of the three alternatives, they use the least amount of silicon, and they can be used independently. When not needed, they can be put to sleep, which greatly reduces dynamic power dissipation.

Figure 14.7 encapsulates many significant reasons why using multiple on-chip processors to increase an SOC’s processing capability is a good idea:

  • Lower clock rate

  • Lower power dissipation

  • Smaller silicon footprint

  • More options for effectively using the available processing power.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.142.62