20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)

Iterative techniques are suited for large matrices. A simple iterative technique for solving linear equations is the Jacobi iteration, which is suited for matrices that have nonzero diagonal elements. Assume we are given the system of linear equations

(20.47) c20e047

we can express the ith row of the above system explicitly:

(20.48) c20e048

where we have isolated the term involving xi. We can “solve” for xi from the above equation as

(20.49) c20e049

Of course, we need to iterate several times before we converge to the correct solution. At iteration k, we can estimate c20ue006 as

(20.50) c20e050

Gauss–Seidel iteration differs from the Jacobi iteration in that it uses the most recently found values of c20ue007 in the iterations:

(20.51) c20e051

The order of evaluation of the algorithm is to find c20ue008, c20ue009, … , c20ue010. Note that in the Gauss–Seidel iteration, we use the most recent information for the first sum on the right-hand side (RHS).

Jacobi and Gauss–Seidel iterations might be very slow and SOR is meant to speed up the convergence. SOR iterations are described by the equation:

(20.52) c20e052

where 0 < ω < 1 is a relaxation parameter chosen to speed the algorithm. The order of evaluation of the algorithm is to find c20ue011, c20ue012, … , c20ue013.

20.6.1 SOR Algorithm

Algorithm 20.2 shows the SOR in algorithmic form.

Algorithm 20.2

SOR algorithm

Require: Input: N × N system matrix A, ω, k_max

1: for k = 1 : k_max do

2: for i = 1 : N do

3: c20ue014;

4: for j = 1 : i − 1 do

5: sum_1(i, j + 1, k) = sum_1(i, j, k) + a(i, j)x(j, k);

6: end for

7: for j = N : i + 1 do

8: sum_2(i, j − 1, k) = sum_2(i, j, k) + a(i, j)x(j, k);

9: end for

10: c20ue015;

11: end for

12: end for

The SOR algorithm is 3-D, with indices 1 ≤ iN, 1 ≤ jN, and 1 ≤ kk_max. The convex hull describing the computation domain is a rectangular prism. We show in Fig. 20.8 only one layer for a given value of k.

Figure 20.8 Dependence graph for the SOR algorithm for a given iteration when system matrix has dimensions 5 × 5.

c20f008

Figure 20.5 shows the details of the dependence graph at a certain iteration when system matrix has dimensions 5 × 5. This way we can think of the dependence graph in terms of multiple 2-D dependence graphs, which are easier to visualize. We see from the algorithm that matrix element aij is represented by vertical lines along the k-axis and intersects the computation domain at points (i, j, 1) and (i, j, N). Element bi is represented by a plane whose normal is along the i-axis. The horizontal lines represent information flow for the sum_1, sum_2, and input variable c20ue016. The output variable c20ue017 is shown by the vertical lines.

20.6.2 SOR Algorithm Scheduling Algorithm

A 3-D scheduling function has the form

(20.53) c20e053

There are several restrictions on the values of the scheduling vector based on the nature of the SOR algorithm. Node (i, j, k) is used to calculate the product c20ue018. The valuation of variable c20ue019 is performed at location (i, i, k + 1) because it requires the all the product terms at the following locations:

  • c20ue020
  • c20ue021

This translates to the following restrictions on the scheduling vector components

(20.54) c20e054

(20.55) c20e055

The above two inequalities produce

(20.56) c20e056

(20.57) c20e057

The largest value for n is when n = N. Therefore, we can write the inequality as

(20.58) c20e058

(20.59) c20e059

Let us choose to have a scheduling function of the form

(20.60) c20e060

The −N − 1 term on the RHS is meant to ensure that time starts with value 0 when i = k = 1 initially. Figure 20.9 shows the DAG for the SOR algorithm at the first two iterations when system matrix has dimensions 5 × 5. The time needed to complete the iterations would be equal to N2.

Figure 20.9 DAG for the SOR algorithm at the first two iterations when system matrix has dimensions 5 × 5.

c20f009

20.6.3 SOR Algorithm Projection Direction

The work done at each time step is N and the work done at each iteration is N2. Therefore, it makes sense to obtain a reduced DAG (c20ue022) that is one-dimensional and contains only N nodes. We can accomplish this using two projection directions

(20.61) c20e061

(20.62) c20e062

The corresponding projection matrix can be obtained using the Chapters 10 or 11 as

(20.63) c20e063

A point (i, j, k) in DAG will map to point j in c20ue023. The resulting reduced DAG (c20ue024) is shown in Fig. 20.10.

Figure 20.10 Reduced c20ue026 for the SOR algorithm.

c20f010

20.7 PROBLEMS

20.1. Study the parallelization of the back substitution algorithm.

20.2. Study the parallelization of the Gaussian elimination algorithm.

20.3. Explain how the LDMt algorithm in Table 20.1 can be used to solve for the unknown vector x. Study the parallelization of this algorithm and relate it to the forward and back substitution algorithms.

20.4. Study the parallelization of the banded matrix–vector multiplication algorithm.

20.5. Study the parallelization of the banded matrix–matrix multiplication algorithm.

20.6. Study the parallelization of the Gauss–Seidel algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.74.55