3.4   COMPUTATIONAL LOGIC ARRAYS

It is becoming apparent that the best method of increasing the speed of computation as we approach the speed limits of silicon technology is to increase parallelism. One problem with increasing parallelism is that computing structures consist of more than one resource. There are three classes—processor, memory, and communications. Different algorithms require different amounts of each resource, but to guarantee n times speedup, an n processor parallel computer must provide n times the limiting resource, that is, not only n processors but n times the memory and communications as well. While VLSI technology can provide increases in density each year it is not feasible to expect to increase performance on each of three independent axes while maintaining constant price. Parallel supercomputers will always be expensive.

Systolic and other VLSI design styles where processors are tailored to particular applications offer a way out of this problem. Since processors are specialized they can be much simpler and hence smaller and faster than general-purpose processors: thus one can have more of them at the same cost. Similarly memory and communication resources can be tailored for a given algorithm. Reconfigurable VLSI devices have the potential to provide the benefits of special-purpose processors to general-purpose computing devices.

Computational logic arrays consist of an array of rectangular cells covering the plane provided by the underlying silicon. In their purest form there are only connections to their nearest neighbors. This architecture has the following features:

  1. Regularity and symmetry. The regularity and symmetry of the cellular array is attractive from the point of view of the CAD tool author, since there are fewer special cases to take account of in the choice of algorithm. When combined with a hierarchical design style, these properties allow subcircuits to be placed and routed independently, then rotated and reflected within larger blocks to optimize the floorplan. By contrast, most channeled architectures start off by flattening the netlist to eliminate hierarchy. Hierarchical place-and-route schemes can be much faster and more area-efficient than ones based on a flat netlist, particularly for well-structured regular designs.
  2. The cells within the array implement a small, carefully chosen set of primitives. There is no support for constructs from previous technologies that do not map efficiently onto the programmable architecture. This allows designs created with this technology in mind to be implemented efficiently, but makes direct transfer of designs done using primitives from previous technologies costly.
  3. The single resource of the cellular array scales well with improvements in technology. Instead of trading wiring channels for cells, as in a channeled architecture, one simply provides more cells. Cellular arrays can be designed to scale across chip boundaries without the chip boundary being apparent to the user.
  4. Cellular arrays can be embedded in a static RAM in a manner that allows random access to the control store. This in turn allows rapid reprogramming, partial reprogramming of sections of the device, and access to internal state within circuits implemented by the device. This sort of access is harder to achieve in channeled structures with multiple classes of resource, because of discontinuities in the layout of the control store.
  5. Cellular arrays accumulate delays through neighbor connections. This fact, coupled with their target application areas, has led to design styles that emphasize the use of pipelining to maximize throughput at the expense of latency. Pipelining is supported efficiently by cellular architectures that offer latches or flip-flops as cell functions.

As an example of a cellular array, we will consider the Algotronix Configurable Array Logic (CAL) devices [Kean89]. Other examples include the Atmel 6000 series, described in Chapter 6, and the Pilkington Micro-electronics architecture described in Chapter 8.

3.4.1   The Algotronix CAL 1024

The Algotronix device consists of an array of cells with nearest-neighbor connections as shown in Figure 3–11. Each of these cells contains a function unit capable of implementing any of the 16 possible combinational functions of its two variables (labeled X1 and X2), or transparent latches with true or inverted data (X2) and clock (X1) inputs for a total of 20 possible functions. The cell also contains a routing unit (Figure 3–12) which can route any of its neighbor inputs (North, South, East and West) or the function unit output (Self) to any useful neighbor output. Although the multiplexers cannot route signals back to where they came from, for example, North Input to North Output, this configuration would be redundant since the signal will be available on another source within the cell to the north: the user programming model allows these connections, but the CAD system eliminates them prior to programming the physical device. Since each output has its own dedicated multiplexer, routing an input to an output cannot block the use of any of the other outputs (as can happen with switch boxes in channeled arrays). Similarly, the routing function is perfectly symmetrical, allowing rotation and reflection of designs after placement and routing. The two function unit inputs can be taken from any of the neighbor inputs, and in addition, the X1 input can be taken from one of two global signals (G1 and G2) that are fed to all cells on the array. These signals are normally used for clock distribution. The output of the function unit is mapped into the device control store and can be read back using the programming interface. In addition, incremental changes can be made to portions of the control store, and it is possible to clear latches using the programming interface.

Image

Figure 3-11. CAL cellular array. Figure courtesy of Xilinx, Inc. © Xilinx. Inc. 1991. All rights reserved.

The CAL is programmed using a RAM-like interface with address and 8-bit-wide data busses that are not shared with input–outputs from the cell array. This allows the control store to be accessed without affecting computations running on the array. All 32 input and output signals from the edge of the array are brought out, and can be connected to an adjacent chip. This allows chips to be cascaded to support the user programming model of an arbitrarily large cellular array: chip boundaries are only apparent to the user by the increased delay on signals that cross them. This simple programming model makes large multichip structures much easier to design. The cascading of chips is supported by a proprietary pad architecture, which uses three logic levels to allow an input and an output signal from a chip to its neighbor to share the same physical pad. The pads can also be programmed to interface to normal TTL and CMOS parts.

Image

Figure 3-12. CAL cell routing. Figure courtesy of Xilinx, Inc. © Xilinx, Inc. 1991. All rights reserved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.118.95