3.6 SYSTOLIC PROCESSORS

Many authors state that systolic processors are pipeline systems. Truth of the matter is that pipeline processing is a special case of systolic processing. As we have seen in Chapter 2, a pipeline is one-dimensional and data flow is one-directional. A typical pipeline transmits data between adjacent stages. Systolic arrays could be one-, two-, or three-dimensional, or even higher if deemed necessary. Data flow among the adjacent processors along one or more directions. In a pipeline system, each pipeline stage performs a different task. In a systolic processor, all processing elements (PEs) usually perform the same task.

Typically, the interconnection pattern among the PEs is neighbor to neighbor and possibly some global interconnections. Each PE has a small memory to store data and intermediate results. systolic architectures are suited to implement algorithms that are highly regular with simple data dependencies. Examples of these algorithms include

1. linear algebra, for example, matrix–matrix and matrix–vector multiplication, and solving systems of linear equations;

2. string search and pattern matching;

3. digital filters, for example, one-, two-, and three-dimensional digital filters;

4. motion estimation in video data compression; and

5. finite field operations, such as elliptic curve operations.

Figure 3.3 shows an example of a simple SIMD processor used to implement a matrix–matrix multiplication algorithm. From the figure, we see that the matrix coefficients are stored in the PEs in a distributed memory fashion. We also see that communication between processors is neighbor to neighbor as indicated by the vertical arrows and by using global wires as indicated by the horizontal lines. Input data must mainly be supplied to the processors on the left edge. Output data are obtained from the processors at the top edge.

Figure 3.3 Systolic processor for the matrix multiplication algorithm.

c03f003

Some design issues associated with systolic architectures are the following:

1. A systolic processor is designed to implement a specific algorithm. It must be redesigned to implement a different algorithm. Even while implementing the same algorithm, a change in the problem size might require a major redesign of the system.

2. Supplying a large amount of input data to several processors is a serious constraint on the system input/output (I/O) bandwidth. In a one-dimensional systolic processor, inputs are usually fed to one processor then pipelined to the other processors. At other times, inputs are fed to the PEs through a broadcast bus or to all the PEs at one edge of the PE array. This could transform the performance of the systolic processor to an I/O-bound performance. Redundant arrays of inexpensive disks (RAIDs) can be used to provide mass storage with a large memory bandwidth. This concept can be applied to a bank of flash memory as opposed to magnetic disks.

3. Obtaining a large amount of output data from several processors is a serious constraint on the system I/O bandwidth The outputs could be obtained from one processor, from a bus connected to all the processors or from one edge of the PE array. Again, RAIDs can be used to provide mass storage with a large memory bandwidth.

Before we leave this section, it is worthwhile to compare systolic processors with SIMD processors since both types run a single instruction on multiple data on the surface. Table 3.1 compares SIMD and systolic array processors from different perspectives related to architecture, memory, and task granularity.

Table 3.1 Comparing SIMD and Systolic Processors

FeatureSIMDSystolic
Interconnection networkAny typeNeighbor to neighbor plus some buses
Communication patternDepends on algorithmTypically neighbor to neighbor
Interprocessor communicationMessage passingSimple clocked transmission
ProcessorCould be simple or complexTypically very simple
Algorithm implementedAny parallelizable algorithmRegular iterative algorithm (RIA)
IntegrationStand-aloneTypically part of another system
Task granularityTypically coarse: a process or a threadTypically fine: a simple mathematical operation or function
MemoryDistributedDistributed and small
LayoutNot applicableOne-, two-, or three-dimensional grid
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.44.255