The principles of self-timed systems have been known for many years, but such systems are rarely implemented because of their complex hardware, despite the advantages of flexible data rates and the avoidance of metastability problems. The self-timed environment prototypes (STEPs) are a collection of primitives used for implementing self-timed systems. The example discussed here was proposed by Sutherland [Suther89]. Many computing applications use a first in, first out (FIFO) buffer, often implemented as a simple circular queue. This is a memory structure that will hold a sequence of data, and output it in the same order as entered. To build a circular-queue FIFO in a synchronous environment, one must use memory elements that are addressed by two counters acting as pointers. The “in” pointer indicates the address to be filled next, while the “out” pointer identifies the first location to be read. The state of the FIFO, full or empty, is calculated by subtracting “out” from “in”. FIFO control is complicated by synchronization issues and may be unreliable if the FIFO filling and FIFO emptying processes are asynchronous. Sutherland’s self-timed FIFO design is entirely different. The basic operation resembles flow through a pipeline. Data are written to one end and read out from the other. The state of each memory location is either full or empty. If a location is empty, it will pass data through. However, if it is full, it will maintain its own contents. If the first location is full, then the FIFO is full. If the last location is empty, then the FIFO is empty. The entire control circuitry consists of only one gate per location, and the resulting speed may be faster than most synchronous FIFOs.
The self-timed environment is regulated by the handshaking of event signals. An “event” is defined here as a change of state in a control signal. A rising edge has the same significance as a falling edge and both are treated as equivalent events. An XOR gate provides an ideal merging function for events. If either input to an XOR gate undergoes an event, the XOR will cause an event on its output. A Muller-C (Figure 6–16) [Suther89] element provides a rendezvous function for events. There must be an event on each input to a Muller-C before it will cause an event to be output:
C := A.B + B.C + A.C
To understand the behavior of a Muller-C. imagine the following sequence: assume that A, B, and C are all at logic ‘0’. If there is an event on A, C stays at ‘0’, but when B rises to meet A, C also rises to a logic “1”. If B were to then have another event, C would remain at ‘1’ until A experienced an event, to match B. Only then would C fall. The Muller-C waits for events on both inputs before it outputs an event. Figure 6–17 shows the behavior of an XOR as well as a Muller-C element, as recorded by a logic analyzer.
Figure 6–18 shows a variant of the Muller-C element referred to as CnotM, because it has one inverted input and an active-low clear. Figure 6–19 shows the corresponding circuit. It is important to remember that in the self-timed environment, memory elements (similar to D flip-flops) must be triggered by either the rising edge or the falling edge of the control signal. Figure 6–20 shows an event-triggered D-type element, while Figure 6–21 shows its circuit. It comprises two D latches in parallel. One is latched high and the other is latched low. The output always comes from the nontransparent latch.
Figure 6–22 shows just the control structure of a 4-stage micropipeline. The output of each Muller-C can be used to control one storage location. Notice that the right-hand input of each Muller-C element is inverted, signified by a bubble.
R1 = ‘0’ R2 = ‘1’ R3 = ‘0’ R4 = ‘1’
Figure 6–24 shows CAL implementations of the four-stage control unit. Figure 6–25 shows the same construction with the event-triggered flip-flops inserted.
The design was tested on the CHS2×4 board described earlier in this chapter. At first the design was spread over the array of CAL chips, with no regard to chip boundaries, but unreliable operation was traced to the fact that some Muller-C elements were split between chips, introducing an extra feedback loop delay of about 40 ns. The design was relaid to avoid splitting and the problem eliminated.
While the CAL configuration is eminently suitable for experiments with self-timed systems, it carries a lot of structural overhead, that is, configuration memory and multiplexer control. To place these complications in perspective, a custom chip was designed for the self-timed environment, and was fabricated by a 2-μ bulk CMOS n-well process. One structure on this chip is a 15-stage FIFO constructed with the self-timed design principles described earlier. It has been tested with the same measuring equipment used for the CAL implementation. Data can be passed through 12 stages of the FIFO in about 210 ns. Figure 6–26 shows the waveforms of R2, R5, R8, R11, and R14 in the STEP FIFO as it passes data forward. However, this speed is not maintained through the I/O pads. On-chip, the FIFO would appear to be able to handle switching spèeds as high as 57 MHz, but with the pads included the maximum speed is 29 MHz. Data are preserved at these speeds.
The CAL design of a STEP FIFO runs at approximately 15 MHz, while the custom VLSI implementation runs at 29 MHz. It is surprising that the CAL configuration is not substantially slower. This indicates that in the future, most experiments could be done in CAL only, since it is nearly as fast as a custom chip.
3.135.246.193