10

Conclusions Part II

Systems on silicon are increasingly made of a number of independently timed core processors connected together by a communications network. Data passing between two of these processors needs to be retimed to allow it to pass between them safely, but the synchronization techniques that can be used depend on the timing relationship between the sender and the receiver.

In a synchronous system the timing of each process is usually driven by a single common clock. In a mesochronous system the clocks are more loosely linked by being phase locked to a common source so that phase can drift more widely, but within some bound. If the two clocks are effectively locked together in this way, the methods described in Section 7.3 avoid the need for conventional synchronization, and the latency of the interface can be small. In a plesiochronous system each processor may have its own autonomous clock, and Section 7.4 shows how data can be synchronized again, with relatively low latency if the clocks are similar in frequency, but the phase relationship is unbounded.

Latency and throughput are issues with more conventional synchronizers when the clock relationship between the two sides of an interface are unknown. In Section 7.1 a simple synchronizing interface is shown to have a relatively slow throughput, but throughput can be improved, usually at the expense of latency by adding a FIFO as in Section 7.2.

Relatively low-performance systems may not need a full clock cycle for synchronization, and that case part of the cycle can be used for processing as described by the LDL scheme of Section 7.5.1. On the other hand, nanometer fabrication processes may require synchronization times of more than one clock cycle. This is particularly true of high-performance systems where the clock period is only a small number of gate delays. The impact of this on latency can be severe, but reductions in latency are possible using techniques described in Section 7.5.2. Here, if a synchronization time of n cycles is needed to achieve adequate reliability, then it will usually be possible to approximately halve the synchronization time while accepting the possibility that a computation may have to be carried out again. In this case the latency is variable, but the worst case is no greater than that of the standard solution This re-computation procedure may take more than one clock cycle, but only need occur approximately once in several thousand data transmissions, so does not significantly impact on the system performance. Because the synchronization delay is reduced, data transmission latency is also reduced by a factor of approximately two when compared with more conventional synchronizers. Section 7.6 describes asynchronous communication mechanisms (ACMs) which are used in real-time systems where the data passed from one processor to another may not be held up by the reader or the writer, or both. They allow both processors to operate asynchronously by providing a common memory of three or for locations for the exchange of data. The concepts of coherency and freshness are introduced here and the algorithms described are informally shown to provide these properties.

The methods used to code and transmit information on a link between two processors are introduced in Chapter 8, and the advantages and disadvantages of different network architectures are discussed. The performance of a system depends to a large extent on its interconnectedness, and the bandwidth available to transmit information over the network of links. Various methods can be used, from simple serial links with NRZ coding to parallel methods using time endcoding. Their characteristics in terms of bandwidth, power dissipation, timing independence, and wire area are compared. In some cases several independent serial links can be used in parallel in order to increase bandwidth in places where there may be a bottleneck in a network on chip. In this case synchronization of the link is needed, but this will have some shared functionality with the existing need to synchronize data for the receiving processor.

An alternative to core processors with a continuously active clock is described in Chapter 9. If the clock itself is started by the arrival of data it does not need to be synchronized, or if it can be paused as soon as data arrives, conventional synchronization is not necessary either. Clock generators that are pausible or stoppable are described in Section 9.1, and a complete GALS wrapper for running synchronous core processors in an asynchronous environment is described in Section 9.3.

New synchronizers are often invented that claim to avoid the unreliability or unbounded delays implied by metastability, and some of them make it into print. Some of these traps for the unwary are given in Section 7.6, together with a description of how they are supposed to work, and why, in practice, they don't.

Synchronization and Arbitration in Digital Systems D. Kinniment
© 2007 John Wiley & Sons, Ltd

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.107.40