10.7 NONLINEAR PROJECTION OPERATION
The linear projection operation in combination with the scheduling function determines the workload assigned to each thread or PE at any given time step. The linear projection operation is simple but not too flexible. We do not have control over how much calculations could be performed by each thread or PE at a given time step.
We modify the linear projection operation as follows:
(10.48)
where m is the desired number of points in that will be allocated to one thread or PE. The floor(.) function finds the largest integer smaller than the division operation. We can therefore control the workload allocated to each thread or PE per time step as
(10.49)
For a concrete example, assume that our scheduling vector and projection direction are given by
(10.50)
(10.51)
We also assume that N = 1,024, n = 2, and m = 8. In that case, the global workload per time step to be done by all threads is equal to nN = 2,048, and in that case, the output samples will be allocated to threads according to Table 10.2.
Thread ID | Output samples produced by each thread | Input data required by each thread |
0 | y(0) … y(7) | x(0) … x(8 − N) |
1 | y(8) … y(15) | x(8) … x(16 − N) |
3 | y(16) … y(23) | x(16) … x(24 − N) |
4 | y(24) … y(31) | x(24) … x(32 − N) |
10.7.1 Using Concurrency Platforms
At this stage, the programmer is able to determine the execution order of the threads and the timing of the algorithm variables by inspecting the DAG. With this knowledge, the programmer can determine the locations of required locks and barriers in the program. By counting the number of nodes that belong to each equitemporal zone, the programmer can determine the required number of threads to be created. The speedup of the algorithm, and other performance parameters can also be determined. The following section illustrates how this information can be automatically obtained instead of inspecting the DAG of the algorithm.
3.138.134.188