3.1. Introduction to Configurable Processor Architectures

The customary design method for achieving high performance at low clock rates—both at the board and chip level—is to design custom accelerator blocks. On SOCs, these are called RTL blocks and are most often manually developed using Verilog or VHDL. However, custom-designed RTL blocks are not usually firmware programmable because it takes a lot of extra design work to make them programmable. Consequently, most custom-designed logic blocks created for specific acceleration purposes are relatively inflexible. This inflexibility elevates design risk because an SOC that incorporates such blocks must be redesigned to accommodate any design changes brought on by bugs, specification changes, or other market forces.

It is possible to boost a microprocessor’s performance for specific applications by adding execution resources directly to the processor’s ISA. This is the underlying premise behind the development of configurable processor cores or ASIPs. ISA extension is not new. Processor vendors have long added ISA extensions to enhance application performance. The best-known ISA extensions are probably the enhancements made to Intel’s 8086 processor by the 8087 floating-point coprocessor and the MMX and SSE multimedia extensions added to Intel’s Pentium processor family. These examples demonstrate that it’s quite possible to significantly improve a processor’s execution performance for specific tasks through ISA extension.

Note that ISA extension should be used to make a good ISA better, not to shore up an ISA that’s poorly suited to SOC design in the first place. This chapter introduces just such an SOC-friendly ISA, that of Tensilica’s Xtensa microprocessor core. The Xtensa architecture was designed from the start to serve as an on-chip microprocessor core. Consequently, it was designed to be a small, fast 32-bit processor core.

In those characteristics, the Xtensa ISA follows the RISC heritage traced back to IBM’s 801 project in the 1970s and the RISC microprocessor work of John Hennessy at Stanford University and David Patterson at the University of California at Berkeley in the 1980s. The result of this research produced small, fast processor architectures with several characteristic features:

  • Load/store architecture

    - no memory references except for load and store instructions

  • 3-operand instruction orientation

    - two operand sources, one result destination

  • Large general-purpose register file

    - supports the load/store architecture

  • Single-cycle instructions

    - for simplicity and speed

  • Pipelined operation

    - produces the single-cycle instruction throughput.

The Xtensa ISA shares all of these characteristics with the earlier RISC processor architectures. However, the architects of the Xtensa ISA realized that memory footprint would be critically important for on-chip processors using on-chip SOC memory (on-chip SOC memory is much more expensive than memory contained in standard memory chips), so the Xtensa architecture deviates from the traditional RISC fixed-size instruction to reduce the memory footprint of firmware. The basic Xtensa ISA contains a mix of 16- and 24-bit instructions. These 16- and 24-bit instructions all perform 32-bit operations, so they are just as powerful as the 32-bit instructions of the older RISC architectures—they’re merely smaller, which reduces program size and therefore reduces on-chip memory costs.

The original RISC processors employed fixed-size instructions of 32 bits to simplify the processor’s fetch/decode/execute circuitry but the mechanism that converts incoming instruction words into 16- and 24-bit instructions in the Xtensa processor is not complex. The amount of memory saved through the use of 16- and 24-bit instructions more than compensates for the gates used to implement the processor’s mixed-size instruction-fetch and instruction-decode unit.

Because an Xtensa processor’s instruction-fetch cycle retrieves more than one instruction per cycle (including fractions of an instruction word) and because a single Xtensa instruction can cross an aligned fetch (word) boundary, the Xtensa processor stores fetched words from the instruction stream in a FIFO holding buffer. For the base Xtensa ISA, the holding buffer is 32 bits wide and two entries deep. It can be deeper and wider for certain Xtensa processor configurations.

In addition to supporting the 16- and 24-bit instructions in the Xtensa processor’s base ISA, this mixed-size instruction-fetch and instruction-decode unit also supports extended 32- and 64-bit instructions that can be added to the configurable Xtensa processor. Some of the pre-configured Diamond Standard series processor cores have been extended in this manner. These processor cores are discussed in later chapters of this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.61.12