4.8. TIE Ports

A processor’s I/O capabilities are often overshadowed by discussions of ISAs and pipeline architectures. However, I/O bandwidth has a substantial influence on a processor’s overall performance. Most processor core designs are limited to one main bus and a few local-memory buses. As discussed in previous chapters, buses immediately set processors apart from other sorts of hardware blocks. A bus conducts only one transaction per clock cycle so processors with only one bus to communicate with the rest of a system are similarly limited. The TIE language provides powerful ways to significantly boost the Xtensa processor’s I/O bandwidth through the addition of ports and queue interfaces.

Direct processor-to-processor port connections reduce cost and latency for communication between two processors or between a processor and another hardware block. Direct port connections allow data to move directly from one processor’s registers to the registers and execution units of another processor. A simple example of a direct port connection appears in Figure 4.4.

Figure 4.4. A direct processor-to-processor port connection.


A conventional approach to moving data from processor #1 to processor #2 would involve an external communications register. Processor #1 would produce data, normally through a computation. That data normally would be the result of an instruction or instruction sequence and the data would reside in one of the processor’s registers as a result of the computation. Processor #1 would then write this data to the external communications register using a store instruction. The store instruction would initiate a bus cycle, which normally consumes several clock cycles for conventional RISC processor cores. If the data being generated is larger than 32 bits, then several internal register-file entries and external communication registers will be involved in this output operation.

When processor #1 has completed the data-generation phase and stored the data in an external communications register, processor #2 can access the data and use it. Processor #2 will initiate one or more read cycles to retrieve the data from the external communications register using one or more load instructions that will initiate one or more read cycles on processor #2’s main bus. The load instructions will place the data into one or more entries in processor #2’s main register file where the data can then be processed. This extended sequence of stores and loads has become so natural to designers of processor-based systems that the huge amounts of overhead involved no longer appear to the designers’ eye. Invisible or not, that overhead is present and it erodes the system’s performance.

Direct communication ports incorporated into the two processors using TIE descriptions can significantly shorten the sequence of events required to transfer data between processors. A TIE instruction can create the data. A TIE state register in processor #1 can serve as the destination for data produced in processor #1 by the TIE instruction. When the TIE instruction places the instruction result in State Register #1, the value in the register becomes available on the attached communication-port pins. Note that no additional store instruction is needed to place data on these port pins.

The act of placing a result in State Register #1 automatically places the data on the associated communication-port pins. That value is immediately available as an input value to operations in Processor #2 through State Register #2. Any TIE instruction can use State Register #2 as an input operand so no explicit instruction to read the register is needed. TIE port connections can be arbitrarily wide, allowing large and non-power-of-two-sized operands to be transferred easily and quickly between processor #1 and processor #2 and this same mechanism can be used to transfer data from a processor to a block of hardware or from a hardware block into a processor.

Note that the above data-transfer sequence ignores the need for a handshake mechanism between processors. Generally, some sort of polled or interrupt-driven mechanism is required to sequence data flow between the processors. Interrupts and polling routines take time, so simple port interfaces do not deliver the ultimate in transfer speed, although they are faster than bus-based interconnection schemes. For the highest transfer rates, SOC designers should use FIFO queues.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.142.80