20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)
Iterative techniques are suited for large matrices. A simple iterative technique for solving linear equations is the Jacobi iteration, which is suited for matrices that have nonzero diagonal elements. Assume we are given the system of linear equations
(20.47)
we can express the ith row of the above system explicitly:
(20.48)
where we have isolated the term involving xi. We can “solve” for xi from the above equation as
(20.49)
Of course, we need to iterate several times before we converge to the correct solution. At iteration k, we can estimate as
(20.50)
Gauss–Seidel iteration differs from the Jacobi iteration in that it uses the most recently found values of in the iterations:
(20.51)
The order of evaluation of the algorithm is to find , , … , . Note that in the Gauss–Seidel iteration, we use the most recent information for the first sum on the right-hand side (RHS).
Jacobi and Gauss–Seidel iterations might be very slow and SOR is meant to speed up the convergence. SOR iterations are described by the equation:
(20.52)
where 0 < ω < 1 is a relaxation parameter chosen to speed the algorithm. The order of evaluation of the algorithm is to find , , … , .
20.6.1 SOR Algorithm
Algorithm 20.2 shows the SOR in algorithmic form.
Algorithm 20.2
SOR algorithm
Require: Input: N × N system matrix A, ω, k_max
1: for k = 1 : k_max do
2: for i = 1 : N do
3: ;
4: for j = 1 : i − 1 do
5: sum_1(i, j + 1, k) = sum_1(i, j, k) + a(i, j)x(j, k);
6: end for
7: for j = N : i + 1 do
8: sum_2(i, j − 1, k) = sum_2(i, j, k) + a(i, j)x(j, k);
9: end for
10: ;
11: end for
12: end for
The SOR algorithm is 3-D, with indices 1 ≤ i ≤ N, 1 ≤ j ≤ N, and 1 ≤ k ≤ k_max. The convex hull describing the computation domain is a rectangular prism. We show in Fig. 20.8 only one layer for a given value of k.
Figure 20.5 shows the details of the dependence graph at a certain iteration when system matrix has dimensions 5 × 5. This way we can think of the dependence graph in terms of multiple 2-D dependence graphs, which are easier to visualize. We see from the algorithm that matrix element aij is represented by vertical lines along the k-axis and intersects the computation domain at points (i, j, 1) and (i, j, N). Element bi is represented by a plane whose normal is along the i-axis. The horizontal lines represent information flow for the sum_1, sum_2, and input variable . The output variable is shown by the vertical lines.
20.6.2 SOR Algorithm Scheduling Algorithm
A 3-D scheduling function has the form
(20.53)
There are several restrictions on the values of the scheduling vector based on the nature of the SOR algorithm. Node (i, j, k) is used to calculate the product . The valuation of variable is performed at location (i, i, k + 1) because it requires the all the product terms at the following locations:
This translates to the following restrictions on the scheduling vector components
(20.54)
(20.55)
The above two inequalities produce
(20.56)
(20.57)
The largest value for n is when n = N. Therefore, we can write the inequality as
(20.58)
(20.59)
Let us choose to have a scheduling function of the form
(20.60)
The −N − 1 term on the RHS is meant to ensure that time starts with value 0 when i = k = 1 initially. Figure 20.9 shows the DAG for the SOR algorithm at the first two iterations when system matrix has dimensions 5 × 5. The time needed to complete the iterations would be equal to N2.
20.6.3 SOR Algorithm Projection Direction
The work done at each time step is N and the work done at each iteration is N2. Therefore, it makes sense to obtain a reduced DAG () that is one-dimensional and contains only N nodes. We can accomplish this using two projection directions
(20.61)
(20.62)
The corresponding projection matrix can be obtained using the Chapters 10 or 11 as
(20.63)
A point (i, j, k) in DAG will map to point j in . The resulting reduced DAG () is shown in Fig. 20.10.
20.7 PROBLEMS
20.1. Study the parallelization of the back substitution algorithm.
20.2. Study the parallelization of the Gaussian elimination algorithm.
20.3. Explain how the LDMt algorithm in Table 20.1 can be used to solve for the unknown vector x. Study the parallelization of this algorithm and relate it to the forward and back substitution algorithms.
20.4. Study the parallelization of the banded matrix–vector multiplication algorithm.
20.5. Study the parallelization of the banded matrix–matrix multiplication algorithm.
20.6. Study the parallelization of the Gauss–Seidel algorithm.
3.17.74.55