A scalable high-order discontinuous Galerkin method for global atmospheric modeling*

Hae-Won Choia[email protected]; Ramaehandran D. Naira[email protected]; Henry M. Tufoa[email protected]    a Scientific Computing Division. National Center for Atmospheric Research (NCAR). 1850 Table Mesa Drive. Boulder. CO 80305. USA

Publisher Summary

The future evolution of the community-climate system model into an Earth system model will require a highly scalable and accurate flux-form formulation of atmospheric dynamics .Inherently conservative numerical schemes are of fundamental importance in atmospheric and climate modeling to pertain to conservation properties such as mass and total energy. The baroclinic waves are triggered when overlaying the steady-state initial conditions with the zonal wind perturbation whereas the initial conditions are given as quasi-realistic analytic expressions form. The computational domain is the singularity-free cubed-sphere geometry Time integration follows the third-order SSP-RK scheme. To validate proposed 3D DG model, the baroclinic-instability test suite proposed by Jablonowski and Williamson is investigated. Currently, the 3D DG model performs successfully up to 10-day simulation. The DG discretization uses high-order nodal basis set of Lagrange-Legendrepolynomials and fluxes of interelement boundaries are approximated with Lax-Friedrichs numerical flux. Time step was measured for the main DG time-stepping loop using hardware performance counters for IBM supercomputer.

A conservative 3-D discontinuons Galerkin (DG) baroclinic model has been developed in the NCAR High-Order Method Modeling Environment (HOMME) to investigate global atmospheric flows. The computational domain is a cubed-sphere free from coordinate singularities. The DG discretization uses a high-order nodal basis set of orthogonal Lagrange-Legendre polynomials and fluxes of inter-element boundaries are approximated with Lax-Friedrichs numerical flux. The; vertical discretization follows the 1-D vertical Lagrangian coordinates approach combined with the roll-integrated semi-Lagrangian method to preserve conservative remapping. Time integration follows the third-order SSP-RK scheme. To valid proposed 3-D DG model, the baroclinic instability test suite proposed by Jablonowski and Williamson is investigated. Parallel performance, is evaluated on IBM Blue Gene/L and IBM POWER5 p.575 supercomputers.

1 INTRODUCTION

The future evolution of the Community Climate! System Model (CCSM) into an Earth system model will require a highly scalable and accurate flux-form formulation of atmospheric dynamics. Flux form is required in order to conserve tracer species in the atmosphere and accurate numerical schemes are essential to ensure high-tidelity simulations capable of capturing the convective dynamics in the atmosphere and their contribution to the global hydrologieal cycle. Scalable performance is necessary to exploit the massively-parallel petaseale systems that will dominate high-performance computing (HPC) for the foreseeable future.

The High-Order Method Modeling Environment (HOMME) [5], developed by the Scientific Computing Section at the National Center for Atmospheric Research (NCAR), is a vehicle to investigate using high-order-element-based methods to build conservative and accurate dynamical cores. HOMME employs the spectral element. (SE) methods on a cubed-sphere tiled with quadrilateral elements, can be configured to solve the shallow water or the dry/moist primitive equations, and has been shown to efficiently scale to 32,768 processors of an IBM Blue Gene/L (BG/L) system [16]. Nevertheless, a major disadvantage of the SE atmospheric model is that it is not inherently conservative. For climate and atmospheric applications, conservation of integral invariants such as mass and total energy is of significant importance. To resolve these issues, we recently included the DG atmospheric models to support HOMME framework.

In this paper we discuss our extension of the HOMME framework to include a 3-D DG option as a first step towards providing the atmospheric science community a new generation of atmospheric general circulation models (AGCMs). The DG method [2], which is a hybrid technique combining the finite element and finite volume methods, is inherently conservative and shares the same computational advantages as the SE method such as scalability, high-order accuracy, spectral convergence, and thus is an ideal candidate for climate modeling. The DG method is employed on a quadrilateral mesh of elements using a high-order nodal basis set of orthogonal Lagrange-Legendre polynomials with Gauss-Lobatto-Legendre (CLL) quadrature points. Time integration follows the strong stability-preserving Runge-Kutta (SSP-RK) scheme of Gottlieb et al. [4]. The globe is based on the singularity-free cubed-sphere geometry introduced by [13]. Parallelism is effected through a hybrid MPI/OpcnMP design and domain decomposition through the space-filling curve approach described in [3]. Our work extends earlier efforts [3,1012] in several important ways: first, we develop a scalable conservative 3-D DG-based dynamical core based on the hydrostatic primitive equations; second, we employ the vertical Lagrangian coordinate approach, developed by Starr [15] and later generalized by Lin [8]: and finally, we apply the 1-D cell-integrated semi-Lagrangian method [9] to preserve conservative remapping.

2 CONSERVATIVE DISCONTINUOUS GALERKIN MODEL

Inherently conservative numerical schemes are of fundamental importance in atmospheric and climate modelings in order to pertain conservation properties such as mass and total energy. Toward this effort, the 2-D DG shallow water model in the HOMME framework [10,11] has been recently extended to 3-D DG baroclinic models [12]. Main features of DG baroclinic model are the vertical discretization and the prognostic equations which are based on hyperbolic conservation laws where as the prognostic variables arc pressure thickness δp, covariant wind vectors (u1, u2), potential temperature Θ, and moisture q.

2.1 Hydrostatic primitive equations on the cubed-sphere

The hydrostatic primitive equations in curvilinear coordinates employ the cubed-sphere geometry followed by [10,11]. A sphere is decomposed into ‘6 identical regions’ by an equiangular central projection of the faces of an inscribed cube as displayed in Figure 1. This results in a nonorthogonal curvilinear (x1, x2) coordinate system free of singularities for each face of the cubed-sphere, such that x1, x2 ∈ [−π/4, Π/4]. Each face of the cubed-sphere is partitioned into Nc × Nc,. rectangular non-overlapping elements (total number of element, Nelem=6×Ne2)si1_e. The elements are further mapped onto the reference element bounded by [−1,1] ⊗ [−1, 1] which has Nv × Nv (or Np × Np) GLL grid points. Note that Nv and Np denote the number of velocity and pressure points, respectively. The associated metric tensor, i.e., Gij, in terms of longitude-latitude (λ, θ) is defined as follows:

Gij=ATA; A=[Rcosθλ/x1Rcosλ/x2Rθ/x1Rθ/x2].

si2_e  (1)

f27-01-9780444530356
Figure 1 Cubed-sphere geometry for Nelem =6×5×5 DG elements (left). Logical orientation of cube faces in HOMME (right).

The matrix A is used for transforming spherical velocity (u, v) to the ‘covariant’ (u1, u2) and ‘contravariant’ (u1, u2) ‘cubed-sphere’ velocity such that:

[υu]=A[u2u1];ui=Gijuj;ui=Gijuj;Gij=(Gij)1.

si3_e  (2)

The hydrostatic primitive equations, consisting of the momentum, mass continuity, thermodynamic, and moisture transport equations, can be expressed as a conservative form in curvilinear coordinates.

u1t+cE1=Gu2(f+ζ)RTx1(inp),

si4_e  (3)

u2t+cE2=Gu1(f+ζ)RTx2(inp),

si5_e  (4)

t(p)c(UjΔp)=0.

si6_e  (5)

t(ΘΔp)+c(UjΘp)=0,

si7_e  (6)

t(qΔp)+c(UjΔp)=0,

si8_e  (7)

where

c=(x1,x2),E1=(E,0),E2=(0,E),E=Φ+12(u1u1+u2u2),Uj(u1,u2),Δp=Gδp,ΘT(p0/p)k,k=R/Cp,

si9_e  (8)

where E is the energy term, ζ is the relative vortieity, Φ = gh is the geopotential height and f is the Coriolis parameter.

2.2 Vertical discretization

The vertical discretization follows the 1-D vertical Lagrangian coordinates of Starr [15] based on an ‘evolve and remap’ approach developed by Lin [8]. A terrain following Lagrangian vertical coordinate, as shown in Figure 2 (left), can be constructed by treating any reference Eulerian coordinate as a material surface. The Lagrangian surface are subject, to deform in the vertical direction during the integration, and need to be re-mapped onto a reference coordinate at regular intervals of time. By virtue of this approach, the hydrostatic atmosphere is vertically subdivided into a finite number of pressure intervals or pressure thicknesses. Moreover, the vertical coordinates and advection terms are absent thanks to the Lagrangian framework.

f27-02-9780444530356
Figure 2 Lagrangian vertical coordinates system (left). The 3-D grid structure for the DG baroclinic model (right).

The entire 3-D system can be treated as a vertically stacked shallow water 2-D DG models, as demonstrated in Figure 2 (right), where the vertical levels are coupled only by the discretized hydrostatic relation. Therefore, vertical structures involve no parallel communications. Following Lin [8], at every time step δp is predicted at model levels and used to determine pressure at Lagrangian surfaces by summing the pressure thickness from top (px) to bottom Φ=Φs+Σk1ΔΦksi10_e The geopotential height at interfaces is obtained by using the hydrostatic relation, i.e., ΔΦ = −CpΘΔII where II = (p/p0)κ , and summing the geopotenlial height from bottom (Φs) to top, Φ=Φs+Σk1ΔΦksi11_e. For the baroclinic model, the velocity fields (u1, u2), the moisture q, and total energy (ΓE) are remapped onto the reference Eulerian coordinates using the 1-D conservative cell integrated semi-Lagrangian (CISL) method of Nair and Machenhauer [9]. The temperature field Θ is retrieved from the remapped total energy ΓE.

2.3 DG discretization

The flux form of DG discretization can be formulated such that

tU+cF(U)=S(U),

si12_e  (9)

where U=[u1,u2,Gδp,Θ,q]Tsi13_e denotes prognostic variables, F(U)si14_e is flux function, and S (U) is source term. The corresponding weak Galcrkin formulation of 3-D DG model can be written such that

tΩkUhϕhdΩk=ΩkF(Uh)cϕhdΩkΩk(F(Uh)n)ΩkS(Uh)ϕhdΩk,

si15_e  (10)

where the jump discontinuity at an element boundary requires the solution of a Riemann problem where the flux term F(Uh)nsi16_e can be approximated by a Lax-Friedrichs numerical flux as shown in [11]. The resulting DG discretization leads to following ordinary differential equation (ODE):

dUhdt=L(Uh),Uh(0,T)×Ωk.

si17_e  (11)

The above ODE can be solved by explicit time integration strategy such as the third-order Strong Stability Preserving Runge-Kutta (SSP-RK) scheme by Gottlib et al. [4].

Uh(1)=Uh(n)+ΔtL(Uh(n)),

si18_e  (12)

Uh(2)=34Uh(n)+14Uh(1)+14ΔtL(Uh(1)),

si19_e  (13)

Uh(n+1)=134Uh(n)+23Uh(2)+23ΔtL(Uh(2)).

si20_e  (14)

3 NUMERICAL TEST

The baroclinic instability test proposed by Jablonowski and Williamson [6,7] is used to assess the evolution of an idealized baroclinic wave in the Northern Hemisphere. The baroclinic waves are triggered when overlaying the steady-state initial conditions with the zonal wind perturbation where as the initial conditions are given as quasi-realistic analytic expressions in [6,7]. Numerical computations are performed for the conservative 3-D DG model with 9th order polynomials (i.e., Nv. = Np = 10), horizontal resolution of 26 Lagrangian surfaces (i.e., Nlev = 26 where Nlev denotes the number of vertical levels) and total number of elements Nelem = 216. This case has 561,600 total degrees of freedoms (d.o.f). A Boyd-Vandeven filter in [1] is used for spatial filtering. Figure 3 demonstrates the triggering baroclinic waves and corresponding surface pressure Ps and temperature field T at 850 hPa from day 6 to day 10 . At day 6 the surface pressure shows few weak high and low pressure contours which leads to growth of very small-amplitude waves in the temperature field. At day 8 the baroclinic instability waves in surface pressure are well developed and the temperature waves can clearly be noticed. At day 10 the strong baroclinic pressure waves lead to two waves in the temperature field that have almost peaked and arc beginning to wrap around the trailing fronts.

f27-03-9780444530356
Figure 3 Evolution of the baroclinic wave from integration day 6 to day 10: Surface pressure Ps [hPa] (top) and Temperature field [K] at 850 hPa (bottom).

4 PARALLEL IMPLEMENTATION

The parallel implementation of HOMME is based on a hybrid MPI/OpenMP approach and domain decomposition is applied through Hilbert space-filling curve approach, Sagan [14] and Dennis et al. [3]. The approach generates best partitions when Ne = 2l3m5n, where l, m, and n are integers. The first step to partitioning the computing grid involves the mapping of the 2-D surface of the cubed-sphere into a linear array. Figure 4 illustrates the Hilbert space-filling curve and elements when Nelem = 24. Then the second step involves partitioning the linear array into P contiguous groups, where P is the number of MPI tasks. The space-filling curve partitioning creates contiguous groups of elements and load-balances.

f27-04-9780444530356
Figure 4 A mapping of a Hilbert space-filling curve for Nelem = 24 cubed-sphcre grid.

To perform parallel computing experiments, we uses the IBM Blue Gene/L (BG/L) and IBM POWER5 p575 systems at NCAR. The configuration of these systems is summarized in Table 1. A Message Passing Interface (MPI) job for IBM BG/L machine can be run in coprocessor mode (i.e., a single MPI task runs on each compute node) or in virtual-node mode (i.e., two MPI tasks are run on each compute node). On the other hand, 4 to 8 MPI tasks on each compute node arc performed for IBM POWER5 machine. To determine sustained MFLOPS per processor, the number of floating-point operations per time step was measured for the main DG time stepping loop using hardware performance counters for IBAI supercomputer.

Table 1

Comparison of IBM Blue Gcnc/L and IBM POWERS p575 systems.

ResourceIBM BLUE GENE/LIBM POWERS P575
Clock cycle0.7 GHz1.9 GHz
Memory/proc0.25 GB2.0 GB
Total Processors2048624
Operating SystemMK LinuxAIX 5.3
CompilersIBM BGL XLIBM AIX XL

• IBM Blue Genc/L system uses libmpihpm library and its link and code examples are given as follows:

add -L$(BGL_LIBRARY_PATH) -lmpihpm_f -lbgl_perfctr.rts

call trace_start()

call dg3d_advance()

call trace_stop()

• IBM POWERS p575 system uses libhpm library in HPM Toolkit and its link and code examples are given as follows:

add -L$(HPM_LIBRARY_PATH) -lhpm -lpmapi -lm

#include “f_hpm.h”

call f_hpminit(taskid,’dg3d’)

call f_hpmstart(5,’dg3d advance’)

call dg3d_advance()

call f_hpmstop(5)

call f_hpm_terminate(taskid)

Note that all writing and printing functions are turned off during performance evaluations. Figure 5 demonstrates IBM Blue Gene/L machine sustains between 253 to 266 MFLOPS per processor with coprocessor mode and sustains between 238 to 261 MFLOPS per processor with virtual-node mode where as IBM POWER5 machine sustains between 715 to 732 MFLOPS per processor with 4 tasks per node mode, sustains between 706 to 731 MFLOPS per processor with 6 tasks per node mode, and sustains between 532 to 561 MFLOPS per processor with 8 tasks per node mode. Table 2 summaries the percentage of peak performance for strong scaling results for IBM Blue Gene/L and IBM POWERS systems. IBM Blue Gene/L sustains 9.5%. and 9.3% of peak performance for coprocessor and virtual-node modes, respectively. However, IBM POWERS sustains 9.6% of peak performance for 4 and 6 tasks per node mode where as it sustains 7.4% of peak performance for 8 tasks per node mode. Note that the processors for IBM POWER5 system are grouped maximum 8 per node so that performance drops occur when full (i.e., 8) tasks per node have been used.

f27-05-9780444530356
Figure 5 Parallel performance (i.e., strong scaling) results on IBM BG/L and IBM POWERS p575 systems.

Table 2

Summitry of strong scaling results for IBM Blue Gene/L and IBM P0WER5 p575 systems.

ResourceSustained MFLOPS% of Peak Performance
POWERS: 4 tasks/node7329.6
POWERS: 6 tasks/node7319.6
P0WER5: 8 tasks/node5617.1
BG/L: 1 task/node (CO)2069.5
BG/L: 2 tasks/node (VX)2619.3

5 CONCLUSION

A conservative 3-D DG baroclinic model has been developed in the NCAR HOMME framework. The 3-D DG model is formulated in conservative flux form. The computational domain is the singularity-free cubed-sphere geometry. The DG discretization uses high-order nodal basis set of Lagrange-Legendre polynomials and fluxes of inter-clement boundaries are approximated with Lax-Friedrichs numerical flux. The vertical discretization follows the 1-D vertical Lagrangian coordinates approach. Time integration follows the third-order SSP-RK scheme. To validate proposed 3-D DG model, the baroclinic instability test suite proposed by Jablonowski and Williamson is investigated. Currently, 3-D DG model performs successfully upto 10-day simulation. Parallel experiments are tested on IBM Blue Gene/L and IBM P0WER5 p575 supercomputers. Conservative 3-D DG baroclinic model sustains 9.5% peak performance for IBM Blue Gene/L’s coprocessor mode and sustains 9.6% peak performance for IBM POWERS’s 4 and 6 tasks per node modes.

REFERENCES

1. Boyd JP. The erfe-log filter and the asymptotics of the Euler and Vandeven sequence accelerations. In: Ilin AV, Scott LR, eds. Proceedings of the Third International Conference on Spectral and High Order Methods. 1996:267–276. Houston J. Math.

2. Cockburn B, Karniadakis GE, Shu C-W. Discontinuous Gakerkin methods: Theory, Computation, and Applications. In: Springer; 1–170. Lecture Notes in Computational Science and Engineering. 2000;11.

3. In press Dennis JM, Nair RD, Tufo HM, Levy M, Voran T. Development of a scalable global discontinuous Galerkin atmospheric model. Int. J. Comput. Sci. Eng. 2006.

4. Gottlieb S, Shu C-W, Tadmor E. Strong stability-preserving high-order time discretization methods. SIAM Review. 2001;13(1):89–112.

5. HOMME: High-Order Methods Modeling Enviornment. National Center for Atmospheric Research (NCAR); 2006. http://www.homme.ucar.edu.

6. Jablonowski C, Williamson DL. A baroclinic wave test case for dynamical cores of general circulation models: model intercomparisons. In: National Center for Atmospheric Research (NCAR); 2006. TN-469+STR.

7. In press Jablonowski C, Williamson DL. A baroclinic wave test case for atmospheric model dynamical cores. Mon. Wea. Rev. 2006.

8. Lin S-J. A “Vertically-Lagrangian” finite-volume dynamical core for global models. Mon. Wea. Rev. 2004;132:2293–2307.

9. Nair RD, Manchenhauer B. The mass-conservative cell-integrated semi-Lagrangian advection scheme on the sphere. Mon. Wea. Rev. 2002;130:649–667.

10. Nair RD, Thomas SJ, Loft RD. A discontinuous Galerkin transport scheme on the cubed sphere. Mon. Wea. Rev. 2005;133:814–828.

11. Nair RD, Thomas SJ, Loft RD. A discontinuous Galerkin global shallow water model. Mon. Wea. Rev. 2005;133:876–888.

12. Nair RD, Tufo HM. A scalable high-order dynamical core for climate modeling. In: Proceedings of International Conference on Mesoscale Process in Atmosphere, Ocean and Environment Systems. IMPA 2006. IITD, New Delhi, India; 14–17 February, 2006.

13. Sadourny R. Conservative finite-difference approximations of the primitive equations on quasiuniform spherical grids. Mon. Wea. Rea. 1972;100:136–144.

14. Sagan H. In: Springer-Verlag; 1994. Space-Filling Curves.

15. Starr VP. A quasi-Lagrangian system of hydrodynamical equations. J. Meterology. 1945;2(1):227–237.

16. In press St-Cyr A, Dennis JM, Loft R, Thomas SJ, Tufo HM, Voran T. Early experience with the 360 TF IBM Blue Gene/L Platform. Int. J. Comput. Meth. 2006.


* This paper is supported by the DOE-SciDAC program under award #DE-FG02-04ER63870.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.134.107