1.10. Heterogeneous- and Homogeneous-Processor System-Design Approaches

Figure 1.15 shows the use of two different microprocessor cores, one general-purpose processor and one DSP. Such a system is called a heterogeneous-multiprocessor system. A heterogeneous-multiprocessor design approach has the advantage of matching processor cores with application-appropriate features to specific on-chip tasks.

Selecting just the right processor core or tailoring the processor core to a specific task has many benefits. First, the processor need have no more abilities than required by its assigned task set. This characteristic of heterogeneous-multiprocessor system design minimizes processor gate counts by trimming unneeded features from each processor.

One of the key disadvantages of heterogeneous-multiprocessor design is the need to use a different software-development tool set (compiler, assembler, debugger, instruction-set simulator, real-time operating system, etc.) for each of the different processor cores used in the system design. Either the firmware team must become proficient in using all of the tool sets for the various processors or—more likely—the team must be split into groups, each assigned to different processor cores.

However, this situation is not always the case for heterogeneous-processor system designs, as we’ll see later in this book. Each instance of a configurable processor core can take on exactly the attributes needed for a specific set of tasks and all of the variously configured processor cores, which are based on a common ISA (instruction-set architecture), can still use the same software-development tool suite so that software team members can use familiar tools to program all of the processors in an SOC.

Some SOC designs, called homogeneous multiprocessor systems, use multiple copies of the same processor core. This design approach can simplify software development because all of the on-chip processors can be programmed with one common set of development tools. However, processor cores are not all created equally able. General-purpose processors are not generally good at DSP applications because they lack critical execution units and memory-access modes. A high-speed multiplier/accumulator (MAC) is essential to efficient execution of many DSP algorithms but MACs require a relatively large number of gates so few general-purpose processors have them.

Similarly, the performance of many DSP algorithms can benefit from a processor’s ability to fetch two data words from memory simultaneously, a feature often called XY memory addressing. Few general-purpose processors have XY memory addressing because the feature requires the equivalent of two load/store units and most general-purpose processors have only one such unit.

Although basic, voice-only mobile-telephone handsets generally have only two processors, incorporation of multimedia features (music, still image, and video) has placed additional processing demands on handset system designs and the finely tuned, cost-minimized 2-processor system designs for voice-only phones simply lack processing bandwidth for these additional functions. Consequently, the most recent handset designs with new multimedia features are adding either hardware-acceleration blocks or “application processors” to handle the processing required by the additional features. The design of multiple-processor SOC systems is now sufficiently common to prompt a new term that describes the system-design style that employs more than one processor; multiple-processor SOCs are called MPSOCs. Figure 1.16 is a die layout of one such device, the MW301 media processor for camcorders and other video and still-image devices. The MW301 incorporates five processor cores. Four identical cores share the work of MPEG4 video coding and decoding. A fifth core handles the audio processing.

Figure 1.16. The MediaWorks MW301 media processor chip is a DVD resolution MPEG video and audio encoder/decoder system on a chip for solid state camcorder and portable video products. The chip contains a set of five loosely coupled heterogeneous processors. This is the highest performance fully C-code programmable media processor ever built. Photo courtesy of MediaWorks.


The MediaWorks’ MW301 media processor is a good example of an SOC that transcends the special-purpose nature of early SOC designs. With five firmware-programmable processors, the MW301 is flexible enough to serve many applications in the audio/video realm. Consequently, an SOC like the MW301 media processor is often called a “platform” because the SOC’s hardware design can be used for a variety of applications and its function can be altered, sometimes substantially, simply by changing the code it runs. There’s no need to redesign the SOC hardware for new applications.

Platform SOCs are far more economical to design because they sell in larger volumes due to their flexible nature. Design and NRE (non-recurring engineering) costs can therefore be amortized across a larger chip volume, which reduces the design and NRE burden placed on the total cost of each chip sold.

The current record holder for the maximum number of microprocessor cores placed on one chip is Cisco’s SPP (silicon packet processor) SOC, designed for the company’s CRS-1 92-Tbit/sec router. The massively parallel SPP network processor chip incorporates 192 packet-processing elements (PPEs) organized in 16 clusters. Each cluster contains 12 PPEs and each PPE incorporates one of Tensilica’s 32-bit Xtensa RISC processor cores, a very small instruction cache, a small data memory, and a DMA (direct memory access) controller for moving packet data into and out of the PPE’s data memory. Figure 1.17 shows a diagram of the SPP network processor’s design.

Figure 1.17. Cisco’s SPP network processor SOC incorporates 192 packet-processing elements, each built with an Xtensa 32-bit RISC processor core with instruction enhancements specifically for packet processing.


Cisco’s SPP chip measures 18 mm on a side and is manufactured in a 130 nm process technology by IBM. Although there are 192 PPEs on the SPP die, four of the processors are spares used for silicon yield enhancement so the SPPs operate as 188-processor devices when fielded. There are approximately 18 million gates and 8 Mbits of SRAM on the SPP chip. Each of the PPEs on the chip consumes about 0.5 mm2.

Cisco’s earlier router chips employed hard-wired logic blocks to implement the router algorithms. Because of the number of features Cisco wanted to build into the CRS-1 router, and because networking standards change continuously, it was clear that the CRS-1 architecture required a firmware-programmable architecture. This requirement was brought into sharp focus as the effort to migrate from Internet protocol IPv4 to IPv6 escalated. The firmware-programmable SPP architecture allowed the CRS-1 design team to accommodate this major change in the protocol specification.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.248.149