4.10. Combining Instruction Extensions with Queues

The availability of queue interfaces tied directly to a processor’s execution units permits the use of Xtensa processors in an application domain previously reserved for hand-coded RTL logic blocks: flow-through processing. By combining input-and output-queue interfaces with designer-defined execution units, it’s possible to create a firmware-controlled processing block within a processor that can read values from input queues, perform a computation on those values, and output the result of that computation with a pipelined throughput of one clock per complete input-compute-output cycle.

Figure 4.7 illustrates a simple design of such a system with two 256-bit input queues, one 256-bit output queue, and a 256-bit adder/multiplexer execution unit. Although this processor extension runs under firmware control, its operation bypasses the processor’s memory busses and load/store unit to achieve hardware-like processing speeds.

Figure 4.7. Combining queues with execution unit to add flow-through processing to a configurable processor core.


Even though there is a substantial amount of hardware in this processor extension, its definition consumes only four lines of TIE code:

queue InData1 256 in
queue InData2 256 in
queue OutData 256 out
operation QADD {} {in InData1, in InData2, in SumCtrl, out OutData}
        {assign OutData = SumCtrl ? (InData1 = InData2): InData1;}

The first three lines of this code define the 256-bit input and output queues and the fourth line defines a new processor instruction called QADD that takes data from the input queues, performs 256-bit operations (additions or multiplexing), and passes the results to the 256-bit output queue. QADD instructions are pipelined so the effective instruction cycle time is one clock.

No conventional processor could possibly perform the composite QADD operation in one clock cycle. It would require many, many clock cycles just to retrieve the two 256-bit source operands, several more clock cycles to perform the 256-bit addition, and even more clock cycles to output the 256-bit result. This example demonstrates the ability of TIE extensions to replicate performance results of HDL-defined hardware while retaining firmware programmability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.177.135