6.7. Diamond-Core Ports and Queues

In addition to the various local buses described above, most of the Diamond processor cores have a pair of 32-bit I/O ports, a pair of 32-bit I/O queue interfaces, or both the I/O-port and queue-interface pairs. (Detailed port and queue-interface discussions appear in Chapter 4.) The Diamond processor cores’ ports and queue interfaces allow fast I/O transactions to occur without the use of a multi-cycle bus transaction—that’s why port and queue I/O is so fast.

The 108Mini and 212GP processor cores each have a 32-bit input and a 32-bit output port. One processor instruction can read the input port and another instruction places data on the output port. Both the 330HiFi and 545CK processor cores have a 32-bit input-queue interface and a 32-bit output-queue interface. One processor instruction pops a data value from the head of a FIFO memory attached to the input-queue interface and another instruction pushes a data value into the tail of a FIFO memory attached to the output-queue interface. The 570T processor core has a pair of 32-bit I/O ports and a pair of 32-bit queue interfaces.

All queue push and pop operations are blocking operations. If a Diamond core attempts to pop information from an empty input FIFO, the processor will stall until data is available from the FIFO. Similarly, if a Diamond core attempts to push data into a FIFO that’s already full, the processor will stall until there’s room in the FIFO for the operation to complete. It’s possible to prevent such blocking behavior from occurring by having the application program sample the input FIFO’s empty and the output FIFO’s full status signals before attempting the respective pop and push operations. Instructions to check these status signals are available in Diamond processor cores that have queue interfaces.

Diamond ports and queue interfaces open up new architectural design possibilities. Input and output ports replicate the abilities of input and output registers attached to external processor buses. However, the integral ports and queue interfaces have certain advantages over their bus-attached counterparts. In particular, they are faster than ports based on bus-attached registers because a Diamond processor accesses these ports without initiating a multi-clock bus cycle. Ports and queue interfaces employ implicit addressing (the port read and write instructions implicitly operate on the appropriate port or queue registers) so explicit addresses need not be set up in a register prior to the execution of a conventional load or store instruction to access the port or queue register.

Figure 6.3 illustrates the use of ports with a 3-processor system. Diamond processor core #1 uses two port pins to respectively drive the Run/Stall input of processor core #2 and an input-port pin on processor core #3. When processor #2 is stalled by the appropriate state on its Run/Stall input pin, processor #3 is able to read and write processor #2’s local memory over the PIF bus using the inbound-PIF feature. This configuration allows processor #1 to halt processor #2 and then signal processor #3 to initiate a data transfer or program load under processor #3’s control.

Figure 6.3. Processor #1 uses two output-port pins to control the Run/Stall input on processor #2 and to signal an input-port pin on processor #3, which alerts processor #3 to the availability of the PIF bus for inbound-PIF access to processor #2’s local memory.


For this mechanism to work, processor #3 must periodically poll its input pin, looking for a command from processor #1. If the input port were implemented conventionally using a latch and address decoder on a bus, the polling operation could generate bus traffic. However, polling an input-port pin generates no external bus traffic.

Figure 6.4 shows two Diamond processor cores connected by both a PIF bus and a FIFO. Because the two processors are both connected to the PIF, they very likely have the same clock. This entire subsystem is synchronous and so is the FIFO that connects the two processors. Processor #1 can send data to processor #2 either to its local memory over the PIF using processor #2’s inbound-PIF capability or using the FIFO attached to the processors’ queue interfaces.

Figure 6.4. Processor #1 can communicate with processor #2 over the PIF bus, which is a shared resource, or through the FIFO, which is a private resource.


Traffic over the PIF must contend with other PIF traffic that will include instruction and data traffic for both processors and traffic from other devices attached to the PIF. By contrast, the inter-processor FIFO connection is private. It is not shared. Therefore, the connection between the FIFO and processor #1 is always available to processor #1 and the connection between the FIFO and processor #2 is always available to processor #2. Using this FIFO link, a Diamond processor core can sustain one transfer per clock through the FIFO link. As a consequence, the FIFO configuration can potentially support much higher data rates than PIF-based communications.

In Figure 6.4 the two processors used a common clock. Asynchronous FIFOs allow interconnected Diamond processor cores to run at different clock rates as shown in Figure 6.5.

Figure 6.5. Asynchronous FIFOs allow interconnected Diamond processor cores to run at different clock rates.


Asynchronous FIFOs can be very handy in breaking up large clock domains on an SOC. Creating smaller synchronous clock domains on the chip makes it easier to achieve timing closure in large, complex SOC designs. Further, removing the need to route one clock around an entire chip design can reduce the power dissipation associated with large, high-frequency clock trees. Separate clock domains also allow different blocks within the SOC to operate at different frequencies. Each block runs only at the frequency required to achieve its assigned task, which can further reduce on-chip power dissipation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.126.248