Parallel Deferred Correction method for CFD Problems

D. Guiberta; D. Tromeur-Dervouta,b,*    a ICJ UMR5208 Université Lyon 1 - CNRS, Centre de Développement du Calcul Scientifique Parallèle Université Lyon 1
b IRISA/SAGE INRIA Rennes

1 Motivation of Time Decomposition in the Parallel CFD context

Time domain decomposition methods can bo a complementary approach to improve parallelism on the forthcoming parallel architectures with thousand of processors. Indeed. for a given modelled problem. e.g. through a partial differential equation with a prescribed discretization- the multiplication of the processors leads to a decrease of the computing load allocated to each processors, then the ratio of the computing time with the communication time decreases, leading to less efficiency. Nevertheless, for the evolution problems, the difficulty comes from the sequential nature of the computing. Previous time steps are required to compute the current time step. This last constraint occurs in all time integrators used in computational fluid dynamics of unsteady problems.

During these last years, some attempts have been proposed to develop parallelism in time solvers. Such algorithms as Parareal [6] or Pita [2] algorithms are multiple shooting type methods and are very sensitive to the Jabobian linearization for the correction stage on stiff non-linear problems as shown in [4].

Moreover, this correction step is an algorithm sequential and often much more time consuming to solve for the time integrator than the initial problem.

In this paper we investigate a new solution to obtain a parallelism in time with a two level algorithm that combines pipelined iterations in time and deferred correction iterations to increase the accuracy of the solution. In order to focus on the benefit, of our algorithm we approach CFD problems as an ODE (eventually DAE) problem. From the numerical point of view the development of time domain decomposition methods for stiff ODE problems allows to focus on the time domain decomposition without perturbation coming from the space decomposition. In other side, classical advantages of the space decomposition on the granularity are not available. Indeed, from the engineering point of view in the design with ODE/DAE. the efficiency of the algorithm is more focus on the saving time in the day to day practice.

2 ODE approach for CFD problem

The traditional approach to solve CFD problems starts from the partial differential equation, which is discretized in space with a given accuracy and finally a time discretization with a first order scheme or a second order time scheme is performed to solve problems (usually implicitly due to Courant-Friedrich-Lax condition constraint).

Let consider the 2D Navier-Stokes equations in a Stream function - vorticity formulation:

{ωt+ψxωyψyωx1ReΔω=0Δψ=ωψ=0,ψn=v(t)onΩ

si1_e  (1)

then applying a backward (forward Euler or Runge-Kutta are also possible) time discretisation. we obtain that follows:

{ωn+1+Δt(ψxnωyn+1ψxnωyn+1)ΔtReΔωn+1=ωnΔψn+1=Δωn+1

si2_e  (2)

The discretization in space of the discrete in time equations leads to solve a linearized system with an efficient preconditionned solver such as Schwarz-Krylov method or multi-grid:

{A(ψn)ωn+1=f(ψn),Δψn+1=ωn+1

si3_e  (3)

Let us consider an another approach that is the method of lines or ODE to solve CFD problems. Firstly, the Navier-Stokes problem is written in a Cauchy equation form:

ddt(ωψ)(t)=f(t,ω,ψ),(ωψ)(0)=(ω0ψ0)

si4_e  (4)

then the discretization in space of the f(t, ω, ψ) leads to a system of ODE:

ddt(ωi,jψi,j)=f(t,ωi[±1],j[±1],ψi[±1],j[±1])

si5_e  (5)

This last equation can be solved by an available ODE solver such as SUNDIALS (SUite of Nonlinear and Differential/ALgebraic equation Solvers (http://www.llnl.gov/casc/sundials/)) that contains the latest developments of classical ODE solvers as LSODA [8] or CVODE [1]

Figure 1 corresponds to the lid driven cavity results computed by the ODE method a Reynold number of 5000 and up to 10000. These results are obtained for the impulse lid driven with a velocity of 1. Let us notice that the convergence is sensitive to the singularities of the solution.

f17-01-9780444530356
Figure 1 Impulse lid driven cavity results for a Reynold number of 5000 (top) and 10000 (bottom) with the ODE approach

The advantages and drawbacks of the PDEs and ODE are summarized in Table 1

Table 1

Advantages and drawbacks of ODE and PDEs approaches for CFD problems

PDEsODE
1Fast to solvo one time stepslow to solve one time step (control of time step)
2Decoupling of variablescoupling of variables (closest to incompressibility constraint)
3computing complexity is reduced with the decouplinghigh computing complexity () control of error, whole system
4B.C. hard to handle with impact on the linear solver (data structure/choice)B.C. conditions easy to implement
5CFL condition even with implicit solve (decoupling of variables)CFL condition replaced by the control on time step
6Building of good preconditionner is difficultAdaptive time step allows to pass stiffness
7Adaptive time step hard to handleAdaptive time step is a basic feature in ODE

In conclusion the ODE approach put the computing pressure on the time step control while PDEs approach put the computing pressure on the preconditionned linear/nonlinear solver.

The high time consumming drawback for ODE explains why this approach is not commonly use in CFD. Nevertheless grid computing architecture could give a renewed interest to this ODE approach.

3 Spectral Deferred Correction Method

The spectral deferred correction method (SDC) [7] is a method to improve the accuracy of a time integrator iteratively in order to approximate solution of a functional equation with adding corrections to the defect.

Let us summarize the SDC method for the Cauchy equation:

y(t)=y0+01f(τ,y(τ))dτ

si6_e  (6)

Let ym0y(tm)si7_e an approximation of the searched solution at time tm. Then consider the polynomial q0 that interpolate the function f(t,y) at P points t˜isi8_e i.e:

!q0Rp[x],q0(t˜i)=f(y0(t˜i))

si9_e  (7)

with {t˜0<t˜1<<t˜p}si10_e are points around tm. Then if we define the error between the approximate solution ym0si11_e and the approximate solution using q0:

E(y0,tm)=y0+0tmq0(τ)τym0

si12_e  (8)

Then we can write the error between the exact solution and the approximation ym0si13_e:

y(tm)ym0=0tm(f(τ,y(τ))+q0(t))τ+E(y0,tm)

si14_e  (9)

The defect δ(tm)=y(tm)ym0si15_e, satisfies:

δ(tm1)=δ(tm)+tmtm+1(f(τ,y0(τ)+δ(τ))+q0(τ))dτ+E(y0,tm+1)E(y0,tm)

si16_e  (10)

For the spectral deferred correction method an approximation δm0si17_e of δ(tm) is computed considering for example a Forward Euler scheme:

δm+10=δm0+Δt(f(tm+1,ym+10+δm+10)f(tm+1,ym+10))ym+10+ym0+tmtm+1q0(τ)τ

si18_e

Once the defect computed, the approximation ym0si19_e is updated

ym1=ym0+δm0.

si20_e  (11)

Then a new q0 is computed with using the value ym1si21_e for a new iterate. This method is a sequential iterative process as follows: compute an approximation y0, then iterate in parallel until convergence compute δ and update the solution y by adding the correction term δ.

We propose in the next section a parallel implementation to combine deferred correction method with time domain decomposition.

4 Parallel SDC: a time domain decomposition pipelined approach

The main idea to introduce parallelism in the SDC is to consider the time domain decomposition and to affect a set of time slices to each processors. Then we can pipeline the computation, each processor dealing a different iterate of the SDC.

Algorithm 4.1

(Parallel Version). 1 spread the time slices between processors with an overlap P/2 slices and compute an approximation y0 with a very low accuracy because this is a sequential stage. Each processor keeps only the time slices that it has to take in charge, and can start the pipelined as soon it has finished.

2 Iterate until convergence

2.1 prepare the non blocking receiving of the. last P/2 delta, value of the time interval take in account by the processor.

2.2 If the. left neighbor processor exist wait until receiving the P/2 S value from it in order to update the ymisi22_e in the overlap of size. P/2.

2.3 Compute q0 from ymisi23_e in order to compute δmisi24_e When the 5 values between P/2 + 1 and P of time slices are computed send to left neighbor processor in order it will be able to start the next iterate (corresponding to the stage 2.1).

2.4 Send the P/2 δ values corresponding to the overlap position of the right neighbor-processor.

2.5 Update the solution ymi+1=ymi+δmisi25_e

Remark 4.1

The stage 1 initialization should not be parallelized in order to keep the time, integrator history and avoid jump between the. initial value inside the domain and the value in the overlap on the neighbor processor.

Remark 4.2

The communication stages [2.1] and [2.2] are needed to compute the q0 polynomial that uses the behavior of the function f(t,y) before and after the point tm.

Table 2 gives the times to run the timo domain decomposition pipelined SDC method on an ALTIX350 with IA64 1.5Ghz/6Mo processors and with numalink communication network. The lid driven parameters are a Reynolds number of 10 and a regular mesh of 20 × 20 points, the final time to be reach is T=5. The number of time slices has been set equal to 805. The number of SDC iteration is 25. The stage [1] correspond to the sequential first evaluation of the solution on the time slices (with an rtol of 10−1). The time [2.2] is the time of the pipelined to reach the last processor, results show that the ratio of this time [2.2] on the computation is increasing up to 1/4 for 8 processors case. The Speed-up corrected line gives the parallel capability of the parallel implementation when the pipeline is full.

Table 2

Time, Speed-up and Efficiency of the Time domain decomposition pipelined SDC

#proc1248
[1] (s)17.1218.6817.3616.41
[2.2] (s)1.35e-572.39113129
[2.3] f(t,y)eval (s)1.66e-27.25e-33.13e-31.126e-3
[2.3] q0 computing (s)5.99e-35.5e-35.90e-35.94e-3
euler0.2880.250.190.16
Total (s)20621113742423
Efficiency100%92.6%69.4%60.9%
Speed-up11.852.774.87
Speed up Corrected11.983.277.01

t0015

It exhibits quite reasonable speed-up up to 8 processors for the present implementation.

Remark 4.3

Some improvement of the presented implementation can be performed. firstly we implement the forward Euler to compute the δmi+1si26_e that needs a Newton solving at each iteration with the Jacobian computed by a numerical differentiation. We also can use the sundials solver with blocking al first order the DDF. Secondly, all the value f(t˜m,ymi)si27_e are computed before and could be progressively computed in order to enhance the chance of parallelism. Thirdly, a better load balancing could be acheived with a cyclic distribution of the time slices among the processors.

Figure 2 gives the convergence of the defect on the time interval for different iteration numbers (1.5.9,13,17). Symbols represent the runs on 1 processor (x). 2 processors (o). 8 processors (Square_1). Firstly, we remark that the number of processors does not impact the convergence properties of the method. Secondly, the convergence behavior increases for the last times due to the steady state of the solution. Nevertheless, a good accuracy is obtained on the first times in few SDC iterates.

f17-02-9780444530356
Figure 2 Convergence on the time interval of the defect δ for different iteration numbers (1,5.9.13.17). Symbols represent the runs on 1 processor (x), 2 processors (o) , 8 processors (Square_1)

5 conclusions

In this work we have investigated the numerical solution of 2D Navier-stokes with ODE solvers in the perspective of the time domain decomposition which can be an issue for the grid computing.

we proposed to combine time domain decomposition and spectral deferred correction in order to have a two level algorithm that can be pipelined. The first results proved some parallel efficiency with a such approach.

Nevertheless, the arithmetical complexity of the ODE solver for the lid driven problem consider here, is too much costly. Some scheme such as the C(p, q, j) [3] schemes can decouple the system variables in order to reduce the complexity.

REFERENCES

1. Cohen SD, Hindmarsh AC. CVODE, a Stiff/Nonstiff ODE Solver in C. Computers in Physics. 1996;10(2):138–143.

2. Farhat C, Chandesris M. Time-decomposed parallel time-integrators: theory and feasibility studies for fluid, structure, and fluid-structure applications. Int. J. Numer. Meth. Engng. 2003;58(9):1397–1434.

3. Garbey M, Tromeur-Dervout D. A parallel adaptive coupling algorithm for systems of differential equations. J. Comput. Phys. 2000;161(2):401–427.

4. to appear Guibert D, Tromeur-Dervout D. Parallel Adaptive Time Domain Decomposition for Stiff Systems of ODE/DAE. Computer and Structure. 2006.

5. Hairer E, Norsett SP, Wanner G. Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems. In: Berlin: Springer; 1991. Springer Series in Computational Mathematics.

6. Lions J-L, Maday Y, Turinici G. Résolution d'EDP par un schéma en temps “pararéel”. C.R.A.S. Sér. I Math. 2000;332(7):661–668.

7. Minion L, Michael. Semi-implicit spectral deferred correction methods for ordinary differential equations. Commun. Math. Sci. 2003;1(3):471–500.

8. Petzold LR. A Description of DASSL: A Differential and Algebraic System Solver. SAND82-8637. September 1982.

* This work was funded by the thema “mathématiques” of the Région Rhône-Alpes thru the project: “Développement de méthodologies mathématiques pour le calcul scientifique sur grille”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.190.113