12.3 THE IIR FILTER DEPENDENCE GRAPH

We use Eq. 12.1 to study the index dependencies of the algorithm variables. Variable y is an input/output or intermediate variable, and variables x, a, and b are all input variables. An input/output variable is one that is present on the right-hand side (RHS) and left-hand side (LHS) of the algorithm equations with different index dependencies for each side.

We note that the algorithm gives rise to a two-dimensional (2-D) computation domain x1D49F_EuclidMathOne_10n_000100 since we have two indices, i and j. Since the dimensionality of x1D49F_EuclidMathOne_10n_000100 is low, it is best to visualize x1D49F_EuclidMathOne_10n_000100 using a dependence graph since this is easier for humans to visualize and analyze.

We organize our indices in the form of a vector:

(12.2) c12e002

for given values of the indices, the vector corresponds to a point in the x1D4B5_EuclidMathOne_10n_0001002 space.

12.3.1 The 2-D Dependence Graph

The dimension of a dependence graph is two, which is the number of indices in the algorithm. The graph covers the points p(i, j) ∈ x1D49F_EuclidMathOne_10n_000100, where the range of the indices defines the boundaries of the dependence graph as

(12.3) c12e003

Note that x1D49F_EuclidMathOne_10n_000100 extends to ∞ in the i direction, which defines an extremal ray in that direction.

The dependence matrices for the variables will help us plot them in the filter dependence graph. We prefer to use the concept of dependence graph here because of the low dimensionality of a dependence graph, which facilitates visualization.

Figure 12.1 shows the dependence graph of the 1-D IIR filter for the case N = 4. The following paragraphs describe how the dependence graph was obtained. The missing circles near the j-axis indicate that there are no operations to be performed at these locations since the input to our filter does not have negative indices.

Figure 12.1 Dependence graph for the 1-D IIR filter for the case N = 4.

c12f001

The output variable y has a dependence matrix given by

(12.4) c12e004

The null vector associated with this matrix is

(12.5) c12e005

Therefore, the broadcast domains for this variable are vertical lines in x1D49F_EuclidMathOne_10n_000100. This is shown in Fig. 12.1 for the case N = 4.

The input variables a and b have a dependence matrix given by

(12.6) c12e006

The null vector associated with this matrix is

(12.7) c12e007

Therefore, the broadcast domains for these two variables are the horizontal lines in x1D49F_EuclidMathOne_10n_000100. The input variable x has a dependence matrix given by

(12.8) c12e008

The null vector associated with this matrix is

(12.9) c12e009

Therefore, the broadcast domains for this variable are the diagonal lines in x1D49F_EuclidMathOne_10n_000100.

The input variable y has a dependence matrix given by

(12.10) c12e010

The null vector associated with this matrix is

(12.11) c12e011

Therefore, the broadcast domains for this variable are the diagonal lines in x1D49F_EuclidMathOne_10n_000100 also.

12.3.2 The Scheduling Function for the 1-D IIR Filter

We start by using an affine scheduling function given by

(12.12) c12e012

where the constant s = 0 since the point at the origin p(0, 0) ∈ x1D49F_EuclidMathOne_10n_000100. Any point p = [i j]tx1D49F_EuclidMathOne_10n_000100 is associated with the time value

(12.13) c12e013

Assigning time values to the nodes of the dependence graph transforms the dependence graph to a directed acyclic graph (DAG) as was discussed in Chapters 10 and 11. More specifically, the DAG can be thought of as a serial–parallel algorithm (SPA) where the parallel tasks could be implemented using a thread pool or parallel processors for software or hardware implementations, respectively. The different stages of the SPA are accomplished using barriers or clocks for software or hardware implementations, respectively.

In order to determine the components of s, we turn our attention to the filter inputs x. The input data are assumed to be supplied to our array at consecutive time steps. From the dependence graph, we see that samples x(i) and x(i + 1) could be supplied at points p1 = [i 0]t and p2 = [i + 1 0]t, respectively. The time steps associated with these two input samples are given from Eq. 10.17 by

(12.14) c12e014

(12.15) c12e015

Assuming that the consecutive inputs arrive at each time step, we have t(p2) − t(p1) = 1, and we must have

(12.16) c12e016

So now we know one component of the scheduling vector based on input data timing requirements. Possible valid scheduling functions could be

(12.17) c12e017

All the above timing schedules are valid and have different implications on the timing of the output and partial results. The first scheduling vector results in broadcast input and pipelined output. The second scheduling vector results in broadcast of the output variable y and could produce Design 2, discussed in Chapter 9. The third scheduling vector results in pipelined input and output and could produce Design 3, discussed in Chapter 9. Let us investigate the scheduling vector s = [1 −1]. This choice implies that

(12.18) c12e018

which results in broadcast x and yin samples. Based on our choice for this time function, we obtain the DAG shown in Fig. 12.2. The gray lines indicate equitemporal planes. All nodes lying on the same gray line execute their operations at the time indicated beside the line. The gray numbers indicate the times associated with each equitemporal plane. Notice from the figure that all the input signals x and yin are broadcast to all nodes in an equitemporal plane and output signals yout are pipelined as indicated by the arrows connecting the graph nodes. At this stage, we know the timing of the operations to be performed by each node. We do not know yet which processing element each node is destined to. This is the subject of the next subsection.

Figure 12.2 DAG for the 1-D IIR filter for the case N = 4.

c12f002

12.3.3 Choice of Projection Direction and Projection Matrix

Chapters 10 and 11 explained that the projection operation assigns a node or a group of nodes in the DAG to a thread or processor. The number of assigned nodes determines the workload associated with each task. The operation also indicates the input and output data involved in the calculations. The projection operation controls the workload assigned to each thread/processor at each stage of the execution of the SPA.

From Chapter 11, a restriction on the projection direction d is that

(12.19) c12e019

Therefore, we have three possible choices for projection directions:

(12.20) c12e020

(12.21) c12e021

(12.22) c12e022

The projection matrices associated with each projection direction are given by

(12.23) c12e023

(12.24) c12e024

(12.25) c12e025

12.3.4 Design 1: Projection Direction d1 = [1 0]t

Since the chosen projection direction is along the extremal ray direction, the number of tasks will be finite. A point p = [i j]tx1D49F_EuclidMathOne_10n_000100 maps to the point c12ue001, which is given by

(12.26) c12e026

The reduced or projected DAG (c12ue002) is shown in Fig. 12.3. Task T(i) stores the two filter coefficients a(i) and b(i). The filter output is obtained from the top tasks and is fed back to the first task at the next time step. Notice that both inputs x and y are pipelined between the tasks.

Figure 12.3 c12ue005 for the 1-D IIR filter for the case N = 4 and d1 = [1 0]t. (a) The c12ue006 for the tasks. (b) Task processing details and array implementation.

c12f003

12.3.5 Design 2: Projection Direction d2 = [0 1]t

A point p = [i j]tx1D49F_EuclidMathOne_10n_000100 maps to the point c12ue003 = i. The pipeline is shown in Fig. 12.4. Each task stores all the filter coefficients and is responsible for producing an output sample, which reduces intertask communication requirements. Task T(i) accepts input samples x(iN + 1) to x(i) from the memory or input data bus at time steps iN + 1 to i. Task T(i) also reads the memory or the output data bus for samples yin(iN + 1) to yin(i − 1) at time steps iN + 1 to i − 1. Task T(i) produces the output yout(i) at time step i and stores that signal in the shared memory or places the signal on the output data bus for the case of hardware implementation. The number of tasks is infinite since our projection direction did not coincide with the ray direction. However, each task is active for the duration of N time steps. The task activities at different time steps are shown in Fig. 12.5. The timing diagram thus indicates that we could merge the tasks. Therefore, we relabel the task indices such that T(i) maps to T(j), such that

(12.27) c12e027

Figure 12.4 c12ue007 for the 1-D IIR filter for the case N = 4 and d2 = [0 1]t.

c12f004

Figure 12.5 Task activities for the 1-D FIR filter for the case N = 4 and d2 = [0 1]t.

c12f005

The reduced DAG is shown is in Fig. 12.6.

Figure 12.6 The reduced c12ue008 for the 1-D FIR filter for N = 4 and d2 = [0 1]t. (a) The c12ue009 (b) Task processing details.

c12f006
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.220.21