18.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s3 = [1 −0.5]

Figure 18.6 shows the DAG for the polynomial division algorithm based on our timing function choice s3. We note from the figure that all signals are now pipelined, as indicated by the arrows connecting the nodes. However, we note that there are nodes that do not lie on any equitemporal planes. We have several choices for the timing of nodes that lie between two temporal planes. Alternatively, we could assign a time value equal to either of the temporal planes surrounding the node. In addition, we could assign this node to operate on the negative edge of the clock. The former choice leads to nodes that do not have registers. The latter choice leads to nodes that have registers triggered by the negative edge of the clock. This is the option we follow here.

Figure 18.6 DAG for polynomial division algorithm when s3 = [1 0.5], n = 9, and m = 5.

c18f006

Similar to the two previous designs, we choose a projection vector given by

(18.40) c18e040

The corresponding projection matrix P3 is given by

(18.41) c18e041

A point in the DAG given by the coordinates p = [i j]t will be mapped by the projection matrix P3 into the point c18ue010 = P3p. The c18ue011 corresponding to Design 3 is shown in Fig. 18.7. The c18ue012 consists of m − 1 tasks. Input coefficients of A are fed from the right and the partial remainders are pipelined to all nodes. Coefficient bj of B is stored in task Tj. The task details for hardware systolic array implementation are shown in Fig. 18.7b, where D denotes a 1-bit register to store the intermediate results. The even-numbered tasks contain two positive edge-triggered flip-flops. On the other hand, the odd-numbered tasks contain two negative edge-triggered flip-flops.

Figure 18.7 c18ue016 or linear cellular automaton (LCA) processor array when s3 = [1 0.5], d3 = [1 0]t, n = 9, and m = 5. (a) The resulting tasks at each SPA stage. (b) The task workload details.

c18f007

This design is usually called a linear cellular automaton (LCA) [119]. The design shown here differs from LCAs discussed in the literature in several aspects:

1. Even tasks are clocked using the clock rising-edge.

2. Odd tasks are clocked using the clock rising-edge.

3. One of the inputs is fed from a MUX.

Let qi and ri, 0 ≤ i < m, be the present outputs of task Ti. The next state outputs c18ue013 and c18ue014 are given by

(18.42) c18e042

(18.43) c18e043

And we identify the output and input of the LCA as

(18.44) c18e044

(18.45) c18e045

(18.46) c18e046

The above equations determine the operation of the LCA as follows:

1. Clear all the registers.

2. For time steps 0 and 2, the LFSR is working as a simple shift register moving the coefficients a9 to a5 between the stages.

3. At time step 4, the first quotient coefficient q4 is obtained and is available at the next time step to the leftmost node.

4. The coefficients of Q are obtained from the leftmost node at time steps 4–9.

5. At the end of time step 9, all the remainder polynomial R coefficients are stored in the shift register stages.

6. If it is desired to shift the R coefficients out, then the feedback path must be broken to selectively disable the feedback action.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.64.241