Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Parallel Deferred Correction method for CFD Problems

D. Guibert^a; D. Tromeur-Dervout^a^,^b^,* ^a ICJ UMR5208 Université Lyon 1 - CNRS, Centre de Développement du Calcul Scientifique Parallèle Université Lyon 1
^b IRISA/SAGE INRIA Rennes

1 Motivation of Time Decomposition in the Parallel CFD context

Time domain decomposition methods can bo a complementary approach to improve parallelism on the forthcoming parallel architectures with thousand of processors. Indeed. for a given modelled problem. e.g. through a partial differential equation with a prescribed discretization- the multiplication of the processors leads to a decrease of the computing load allocated to each processors, then the ratio of the computing time with the communication time decreases, leading to less efficiency. Nevertheless, for the evolution problems, the difficulty comes from the sequential nature of the computing. Previous time steps are required to compute the current time step. This last constraint occurs in all time integrators used in computational fluid dynamics of unsteady problems.

During these last years, some attempts have been proposed to develop parallelism in time solvers. Such algorithms as Parareal [6] or Pita [2] algorithms are multiple shooting type methods and are very sensitive to the Jabobian linearization for the correction stage on stiff non-linear problems as shown in [4].

Moreover, this correction step is an algorithm sequential and often much more time consuming to solve for the time integrator than the initial problem.

In this paper we investigate a new solution to obtain a parallelism in time with a two level algorithm that combines pipelined iterations in time and deferred correction iterations to increase the accuracy of the solution. In order to focus on the benefit, of our algorithm we approach CFD problems as an ODE (eventually DAE) problem. From the numerical point of view the development of time domain decomposition methods for stiff ODE problems allows to focus on the time domain decomposition without perturbation coming from the space decomposition. In other side, classical advantages of the space decomposition on the granularity are not available. Indeed, from the engineering point of view in the design with ODE/DAE. the efficiency of the algorithm is more focus on the saving time in the day to day practice.

2 ODE approach for CFD problem

The traditional approach to solve CFD problems starts from the partial differential equation, which is discretized in space with a given accuracy and finally a time discretization with a first order scheme or a second order time scheme is performed to solve problems (usually implicitly due to Courant-Friedrich-Lax condition constraint).

Let consider the 2D Navier-Stokes equations in a Stream function - vorticity formulation:

${\begin{matrix} \begin{matrix} ω_{t} + ψ_{x} ω_{y} - ψ_{y} ω_{x} - \frac{1}{R e} Δ ω = 0 \\ Δ ψ = - ω \end{matrix} \\ ψ = 0, ψ_{n} = v (t) on \partial Ω \end{matrix}$ ${\begin{matrix} \begin{matrix} ω_{t} + ψ_{x} ω_{y} - ψ_{y} ω_{x} - \frac{1}{R e} Δ ω = 0 \\ Δ ψ = - ω \end{matrix} \\ ψ = 0, ψ_{n} = v (t) on \partial Ω \end{matrix}$

si1_e (1)

then applying a backward (forward Euler or Runge-Kutta are also possible) time discretisation. we obtain that follows:

${\begin{matrix} ω^{n + 1} + Δ_{t} (ψ_{x}^{n} ω_{y}^{n + 1} - ψ_{x}^{n} ω_{y}^{n + 1}) - \frac{Δ t}{R e} Δ ω^{n + 1} = ω^{n} \\ Δ ψ^{n + 1} = - Δ ω^{n + 1} \end{matrix}$ ${\begin{matrix} ω^{n + 1} + Δ_{t} (ψ_{x}^{n} ω_{y}^{n + 1} - ψ_{x}^{n} ω_{y}^{n + 1}) - \frac{Δ t}{R e} Δ ω^{n + 1} = ω^{n} \\ Δ ψ^{n + 1} = - Δ ω^{n + 1} \end{matrix}$

si2_e (2)

The discretization in space of the discrete in time equations leads to solve a linearized system with an efficient preconditionned solver such as Schwarz-Krylov method or multi-grid:

${A (ψ^{n}) ω^{n + 1} = f (ψ^{n}), Δ ψ^{n + 1} = - ω^{n + 1}$ ${A (ψ^{n}) ω^{n + 1} = f (ψ^{n}), Δ ψ^{n + 1} = - ω^{n + 1}$

(3)

Let us consider an another approach that is the method of lines or ODE to solve CFD problems. Firstly, the Navier-Stokes problem is written in a Cauchy equation form:

$\frac{d}{d t} (\begin{matrix} ω \\ ψ \end{matrix}) (t) = f (t, ω, ψ), (\begin{matrix} ω \\ ψ \end{matrix}) (0) = (\begin{matrix} ω_{0} \\ ψ_{0} \end{matrix})$ $\frac{d}{d t} (\begin{matrix} ω \\ ψ \end{matrix}) (t) = f (t, ω, ψ), (\begin{matrix} ω \\ ψ \end{matrix}) (0) = (\begin{matrix} ω_{0} \\ ψ_{0} \end{matrix})$

si4_e (4)

then the discretization in space of the f(t, ω, ψ) leads to a system of ODE:

$\frac{d}{d t} (\begin{matrix} ω_{i, j} \\ ψ_{i, j} \end{matrix}) = f (t, ω_{i [\pm 1], j [\pm 1]}, ψ_{i [\pm 1], j [\pm 1]})$ $\frac{d}{d t} (\begin{matrix} ω_{i, j} \\ ψ_{i, j} \end{matrix}) = f (t, ω_{i [\pm 1], j [\pm 1]}, ψ_{i [\pm 1], j [\pm 1]})$

si5_e (5)

This last equation can be solved by an available ODE solver such as SUNDIALS (SUite of Nonlinear and Differential/ALgebraic equation Solvers (http://www.llnl.gov/casc/sundials/)) that contains the latest developments of classical ODE solvers as LSODA [8] or CVODE [1]

Figure 1 corresponds to the lid driven cavity results computed by the ODE method a Reynold number of 5000 and up to 10000. These results are obtained for the impulse lid driven with a velocity of 1. Let us notice that the convergence is sensitive to the singularities of the solution.

f17-01-9780444530356 — Figure 1 Impulse lid driven cavity results for a Reynold number of 5000 (top) and 10000 (bottom) with the ODE approach

The advantages and drawbacks of the PDEs and ODE are summarized in Table 1

Table 1

Advantages and drawbacks of ODE and PDEs approaches for CFD problems

	PDEs	ODE
1	Fast to solvo one time step	slow to solve one time step (control of time step)
2	Decoupling of variables	coupling of variables (closest to incompressibility constraint)
3	computing complexity is reduced with the decoupling	high computing complexity () control of error, whole system
4	B.C. hard to handle with impact on the linear solver (data structure/choice)	B.C. conditions easy to implement
5	CFL condition even with implicit solve (decoupling of variables)	CFL condition replaced by the control on time step
6	Building of good preconditionner is difficult	Adaptive time step allows to pass stiffness
7	Adaptive time step hard to handle	Adaptive time step is a basic feature in ODE

In conclusion the ODE approach put the computing pressure on the time step control while PDEs approach put the computing pressure on the preconditionned linear/nonlinear solver.

The high time consumming drawback for ODE explains why this approach is not commonly use in CFD. Nevertheless grid computing architecture could give a renewed interest to this ODE approach.

3 Spectral Deferred Correction Method

The spectral deferred correction method (SDC) [7] is a method to improve the accuracy of a time integrator iteratively in order to approximate solution of a functional equation with adding corrections to the defect.

Let us summarize the SDC method for the Cauchy equation:

$y (t) = y_{0} + \int_{0}^{1} f (τ, y (τ)) d τ$ $y (t) = y_{0} + \int_{0}^{1} f (τ, y (τ)) d τ$

(6)

Let $y_{m}^{0} ≃ y (t_{m})$ $y_{m}^{0} ≃ y (t_{m})$ an approximation of the searched solution at time t_m. Then consider the polynomial q₀ that interpolate the function f(t,y) at P points ${\tilde{t}}_{i}$ ${\tilde{t}}_{i}$ i.e:

$\exists! q_{0} \in R_{p} [x], q_{0} ({\tilde{t}}_{i}) = f (y^{0} ({\tilde{t}}_{i}))$ $\exists! q_{0} \in R_{p} [x], q_{0} ({\tilde{t}}_{i}) = f (y^{0} ({\tilde{t}}_{i}))$

si9_e (7)

with ${{\tilde{t}}_{0} < {\tilde{t}}_{1} < \dots < {\tilde{t}}_{p}}$ ${{\tilde{t}}_{0} < {\tilde{t}}_{1} < \dots < {\tilde{t}}_{p}}$ are points around t_m. Then if we define the error between the approximate solution $y_{m}^{0}$ $y_{m}^{0}$ and the approximate solution using q₀:

$E (y^{0}, t_{m}) = y_{0} + \int_{0}^{t_{m}} q^{0} (τ) ⅆ τ - y_{m}^{0}$ $E (y^{0}, t_{m}) = y_{0} + \int_{0}^{t_{m}} q^{0} (τ) ⅆ τ - y_{m}^{0}$

(8)

Then we can write the error between the exact solution and the approximation $y_{m}^{0}$ $y_{m}^{0}$ :

$y (t_{m}) - y_{m}^{0} = \int_{0}^{t_{m}} (f (τ, y (τ)) + q^{0} (t)) ⅆ τ + E (y^{0}, t_{m})$ $y (t_{m}) - y_{m}^{0} = \int_{0}^{t_{m}} (f (τ, y (τ)) + q^{0} (t)) ⅆ τ + E (y^{0}, t_{m})$

(9)

The defect $δ (t_{m}) = y (t_{m}) - y_{m}^{0}$ $δ (t_{m}) = y (t_{m}) - y_{m}^{0}$ , satisfies:

$δ (t_{m - 1}) = δ (t_{m}) + \int_{t_{m}}^{t_{m + 1}} (f (τ, y^{0} (τ) + δ (τ)) + q^{0} (τ)) d τ + E (y^{0}, t_{m + 1}) - E (y^{0}, t_{m})$ $δ (t_{m - 1}) = δ (t_{m}) + \int_{t_{m}}^{t_{m + 1}} (f (τ, y^{0} (τ) + δ (τ)) + q^{0} (τ)) d τ + E (y^{0}, t_{m + 1}) - E (y^{0}, t_{m})$

(10)

For the spectral deferred correction method an approximation $δ_{m}^{0}$ $δ_{m}^{0}$ of δ(t_m) is computed considering for example a Forward Euler scheme:

$δ_{m + 1}^{0} = δ_{m}^{0} + Δ_{t} (f (t_{m + 1}, y_{m + 1}^{0} + δ_{m + 1}^{0}) - f (t_{m + 1}, y_{m + 1}^{0})) - y_{m + 1}^{0} + y_{m}^{0} + \int_{t_{m}}^{t_{m + 1}} q^{0} (τ) ⅆ τ$ $δ_{m + 1}^{0} = δ_{m}^{0} + Δ_{t} (f (t_{m + 1}, y_{m + 1}^{0} + δ_{m + 1}^{0}) - f (t_{m + 1}, y_{m + 1}^{0})) - y_{m + 1}^{0} + y_{m}^{0} + \int_{t_{m}}^{t_{m + 1}} q^{0} (τ) ⅆ τ$

Once the defect computed, the approximation $y_{m}^{0}$ $y_{m}^{0}$ is updated

$y_{m}^{1} = y_{m}^{0} + δ_{m}^{0} .$ $y_{m}^{1} = y_{m}^{0} + δ_{m}^{0} .$

(11)

Then a new q₀ is computed with using the value $y_{m}^{1}$ $y_{m}^{1}$ for a new iterate. This method is a sequential iterative process as follows: compute an approximation y⁰, then iterate in parallel until convergence compute δ and update the solution y by adding the correction term δ.

We propose in the next section a parallel implementation to combine deferred correction method with time domain decomposition.

4 Parallel SDC: a time domain decomposition pipelined approach

The main idea to introduce parallelism in the SDC is to consider the time domain decomposition and to affect a set of time slices to each processors. Then we can pipeline the computation, each processor dealing a different iterate of the SDC.

Algorithm 4.1

(Parallel Version). 1 spread the time slices between processors with an overlap P/2 slices and compute an approximation y⁰ with a very low accuracy because this is a sequential stage. Each processor keeps only the time slices that it has to take in charge, and can start the pipelined as soon it has finished.

2 Iterate until convergence

2.1 prepare the non blocking receiving of the. last P/2 delta, value of the time interval take in account by the processor.

2.2 If the. left neighbor processor exist wait until receiving the P/2 S value from it in order to update the $y_{m}^{i}$ $y_{m}^{i}$ in the overlap of size. P/2.

2.3 Compute q₀ from $y_{m}^{i}$ $y_{m}^{i}$ in order to compute $δ_{m}^{i}$ $δ_{m}^{i}$ When the 5 values between P/2 + 1 and P of time slices are computed send to left neighbor processor in order it will be able to start the next iterate (corresponding to the stage 2.1).

2.4 Send the P/2 δ values corresponding to the overlap position of the right neighbor-processor.

2.5 Update the solution $y_{m}^{i + 1} = y_{m}^{i} + δ_{m}^{i}$ $y_{m}^{i + 1} = y_{m}^{i} + δ_{m}^{i}$

Remark 4.1

The stage 1 initialization should not be parallelized in order to keep the time, integrator history and avoid jump between the. initial value inside the domain and the value in the overlap on the neighbor processor.

Remark 4.2

The communication stages [2.1] and [2.2] are needed to compute the q₀ polynomial that uses the behavior of the function f(t,y) before and after the point t_m.

Table 2 gives the times to run the timo domain decomposition pipelined SDC method on an ALTIX350 with IA64 1.5Ghz/6Mo processors and with numalink communication network. The lid driven parameters are a Reynolds number of 10 and a regular mesh of 20 × 20 points, the final time to be reach is T=5. The number of time slices has been set equal to 805. The number of SDC iteration is 25. The stage [1] correspond to the sequential first evaluation of the solution on the time slices (with an rtol of 10⁻¹). The time [2.2] is the time of the pipelined to reach the last processor, results show that the ratio of this time [2.2] on the computation is increasing up to 1/4 for 8 processors case. The Speed-up corrected line gives the parallel capability of the parallel implementation when the pipeline is full.

Table 2

Time, Speed-up and Efficiency of the Time domain decomposition pipelined SDC

#proc	1	2	4	8
[1] (s)	17.12	18.68	17.36	16.41
[2.2] (s)	1.35e-5	72.39	113	129
[2.3] f(t,y)eval (s)	1.66e-2	7.25e-3	3.13e-3	1.126e-3
[2.3] q₀ computing (s)	5.99e-3	5.5e-3	5.90e-3	5.94e-3
euler	0.288	0.25	0.19	0.16
Total (s)	2062	1113	742	423
Efficiency	100%	92.6%	69.4%	60.9%
Speed-up	1	1.85	2.77	4.87
Speed up Corrected	1	1.98	3.27	7.01

t0015

It exhibits quite reasonable speed-up up to 8 processors for the present implementation.

Remark 4.3

Some improvement of the presented implementation can be performed. firstly we implement the forward Euler to compute the $δ_{m}^{i + 1}$ $δ_{m}^{i + 1}$ that needs a Newton solving at each iteration with the Jacobian computed by a numerical differentiation. We also can use the sundials solver with blocking al first order the DDF. Secondly, all the value $f ({\tilde{t}}_{m}, y_{m}^{i})$ $f ({\tilde{t}}_{m}, y_{m}^{i})$ are computed before and could be progressively computed in order to enhance the chance of parallelism. Thirdly, a better load balancing could be acheived with a cyclic distribution of the time slices among the processors.

Figure 2 gives the convergence of the defect on the time interval for different iteration numbers (1.5.9,13,17). Symbols represent the runs on 1 processor (x). 2 processors (o). 8 processors (). Firstly, we remark that the number of processors does not impact the convergence properties of the method. Secondly, the convergence behavior increases for the last times due to the steady state of the solution. Nevertheless, a good accuracy is obtained on the first times in few SDC iterates.

f17-02-9780444530356 — Figure 2 Convergence on the time interval of the defect δ for different iteration numbers (1,5.9.13.17). Symbols represent the runs on 1 processor (x), 2 processors (o) , 8 processors ()

5 conclusions

In this work we have investigated the numerical solution of 2D Navier-stokes with ODE solvers in the perspective of the time domain decomposition which can be an issue for the grid computing.

we proposed to combine time domain decomposition and spectral deferred correction in order to have a two level algorithm that can be pipelined. The first results proved some parallel efficiency with a such approach.

Nevertheless, the arithmetical complexity of the ODE solver for the lid driven problem consider here, is too much costly. Some scheme such as the C(p, q, j) [3] schemes can decouple the system variables in order to reduce the complexity.

^* This work was funded by the thema “mathématiques” of the Région Rhône-Alpes thru the project: “Développement de méthodologies mathématiques pour le calcul scientifique sur grille”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Parallel Deferred Correction method for CFD Problems

Create new playlist

Sign In

Sign Up

1 Motivation of Time Decomposition in the Parallel CFD context

2 ODE approach for CFD problem

3 Spectral Deferred Correction Method

4 Parallel SDC: a time domain decomposition pipelined approach

5 conclusions

Table of Contents for
Parallel Deferred Correction method for CFD Problems