Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A scalable high-order discontinuous Galerkin method for global atmospheric modeling*

Hae-Won Choi^a[email protected]; Ramaehandran D. Nair^a[email protected]; Henry M. Tufo^a[email protected] ^a Scientific Computing Division. National Center for Atmospheric Research (NCAR). 1850 Table Mesa Drive. Boulder. CO 80305. USA

Publisher Summary

The future evolution of the community-climate system model into an Earth system model will require a highly scalable and accurate flux-form formulation of atmospheric dynamics .Inherently conservative numerical schemes are of fundamental importance in atmospheric and climate modeling to pertain to conservation properties such as mass and total energy. The baroclinic waves are triggered when overlaying the steady-state initial conditions with the zonal wind perturbation whereas the initial conditions are given as quasi-realistic analytic expressions form. The computational domain is the singularity-free cubed-sphere geometry Time integration follows the third-order SSP-RK scheme. To validate proposed 3D DG model, the baroclinic-instability test suite proposed by Jablonowski and Williamson is investigated. Currently, the 3D DG model performs successfully up to 10-day simulation. The DG discretization uses high-order nodal basis set of Lagrange-Legendrepolynomials and fluxes of interelement boundaries are approximated with Lax-Friedrichs numerical flux. Time step was measured for the main DG time-stepping loop using hardware performance counters for IBM supercomputer.

A conservative 3-D discontinuons Galerkin (DG) baroclinic model has been developed in the NCAR High-Order Method Modeling Environment (HOMME) to investigate global atmospheric flows. The computational domain is a cubed-sphere free from coordinate singularities. The DG discretization uses a high-order nodal basis set of orthogonal Lagrange-Legendre polynomials and fluxes of inter-element boundaries are approximated with Lax-Friedrichs numerical flux. The; vertical discretization follows the 1-D vertical Lagrangian coordinates approach combined with the roll-integrated semi-Lagrangian method to preserve conservative remapping. Time integration follows the third-order SSP-RK scheme. To valid proposed 3-D DG model, the baroclinic instability test suite proposed by Jablonowski and Williamson is investigated. Parallel performance, is evaluated on IBM Blue Gene/L and IBM POWER5 p.575 supercomputers.

1 INTRODUCTION

The future evolution of the Community Climate! System Model (CCSM) into an Earth system model will require a highly scalable and accurate flux-form formulation of atmospheric dynamics. Flux form is required in order to conserve tracer species in the atmosphere and accurate numerical schemes are essential to ensure high-tidelity simulations capable of capturing the convective dynamics in the atmosphere and their contribution to the global hydrologieal cycle. Scalable performance is necessary to exploit the massively-parallel petaseale systems that will dominate high-performance computing (HPC) for the foreseeable future.

The High-Order Method Modeling Environment (HOMME) [5], developed by the Scientific Computing Section at the National Center for Atmospheric Research (NCAR), is a vehicle to investigate using high-order-element-based methods to build conservative and accurate dynamical cores. HOMME employs the spectral element. (SE) methods on a cubed-sphere tiled with quadrilateral elements, can be configured to solve the shallow water or the dry/moist primitive equations, and has been shown to efficiently scale to 32,768 processors of an IBM Blue Gene/L (BG/L) system [16]. Nevertheless, a major disadvantage of the SE atmospheric model is that it is not inherently conservative. For climate and atmospheric applications, conservation of integral invariants such as mass and total energy is of significant importance. To resolve these issues, we recently included the DG atmospheric models to support HOMME framework.

In this paper we discuss our extension of the HOMME framework to include a 3-D DG option as a first step towards providing the atmospheric science community a new generation of atmospheric general circulation models (AGCMs). The DG method [2], which is a hybrid technique combining the finite element and finite volume methods, is inherently conservative and shares the same computational advantages as the SE method such as scalability, high-order accuracy, spectral convergence, and thus is an ideal candidate for climate modeling. The DG method is employed on a quadrilateral mesh of elements using a high-order nodal basis set of orthogonal Lagrange-Legendre polynomials with Gauss-Lobatto-Legendre (CLL) quadrature points. Time integration follows the strong stability-preserving Runge-Kutta (SSP-RK) scheme of Gottlieb et al. [4]. The globe is based on the singularity-free cubed-sphere geometry introduced by [13]. Parallelism is effected through a hybrid MPI/OpcnMP design and domain decomposition through the space-filling curve approach described in [3]. Our work extends earlier efforts [3,10–12] in several important ways: first, we develop a scalable conservative 3-D DG-based dynamical core based on the hydrostatic primitive equations; second, we employ the vertical Lagrangian coordinate approach, developed by Starr [15] and later generalized by Lin [8]: and finally, we apply the 1-D cell-integrated semi-Lagrangian method [9] to preserve conservative remapping.

2 CONSERVATIVE DISCONTINUOUS GALERKIN MODEL

Inherently conservative numerical schemes are of fundamental importance in atmospheric and climate modelings in order to pertain conservation properties such as mass and total energy. Toward this effort, the 2-D DG shallow water model in the HOMME framework [10,11] has been recently extended to 3-D DG baroclinic models [12]. Main features of DG baroclinic model are the vertical discretization and the prognostic equations which are based on hyperbolic conservation laws where as the prognostic variables arc pressure thickness δp, covariant wind vectors (u₁, u₂), potential temperature Θ, and moisture q.

2.1 Hydrostatic primitive equations on the cubed-sphere

The hydrostatic primitive equations in curvilinear coordinates employ the cubed-sphere geometry followed by [10,11]. A sphere is decomposed into ‘6 identical regions’ by an equiangular central projection of the faces of an inscribed cube as displayed in Figure 1. This results in a nonorthogonal curvilinear (x¹, x²) coordinate system free of singularities for each face of the cubed-sphere, such that x¹, x² ∈ [−π/4, Π/4]. Each face of the cubed-sphere is partitioned into N_c × N_c,. rectangular non-overlapping elements (total number of element, $N_{e l e m} = 6 \times N_{e}^{2})$ $N_{e l e m} = 6 \times N_{e}^{2})$ . The elements are further mapped onto the reference element bounded by [−1,1] ⊗ [−1, 1] which has N_v × N_v (or N_p × N_p) GLL grid points. Note that N_v and N_p denote the number of velocity and pressure points, respectively. The associated metric tensor, i.e., G_ij, in terms of longitude-latitude (λ, θ) is defined as follows:

$G_{i j} = A^{T} A; A= [\begin{matrix} R cos θ \partial λ / {\partial x}^{1} & R cos \partial λ / {\partial x}^{2} \\ R \partial θ / {\partial x}^{1} & R \partial θ / {\partial x}^{2} \end{matrix}] .$ $G_{i j} = A^{T} A; A= [\begin{matrix} R cos θ \partial λ / {\partial x}^{1} & R cos \partial λ / {\partial x}^{2} \\ R \partial θ / {\partial x}^{1} & R \partial θ / {\partial x}^{2} \end{matrix}] .$

si2_e (1)

f27-01-9780444530356 — Figure 1 Cubed-sphere geometry for N_elem =6×5×5 DG elements (left). Logical orientation of cube faces in HOMME (right).

The matrix A is used for transforming spherical velocity (u, v) to the ‘covariant’ (u₁, u₂) and ‘contravariant’ (u¹, u²) ‘cubed-sphere’ velocity such that:

$[\overset{u}{υ}] = A [\overset{u^{1}}{u^{2}}]; u^{i} = G^{i j} u_{j}; u_{i} = G_{i j} u^{j}; G^{i j} = {(G_{i j})}^{- 1} .$ $[\overset{u}{υ}] = A [\overset{u^{1}}{u^{2}}]; u^{i} = G^{i j} u_{j}; u_{i} = G_{i j} u^{j}; G^{i j} = {(G_{i j})}^{- 1} .$

si3_e (2)

The hydrostatic primitive equations, consisting of the momentum, mass continuity, thermodynamic, and moisture transport equations, can be expressed as a conservative form in curvilinear coordinates.

$\frac{{\partial u}_{1}}{\partial t} + \nabla_{c} \cdot {\vec{E}}_{1} = \sqrt{{G u}^{2}} (f + ζ) - R T \frac{\partial}{{\partial x}^{1}} (in p),$ $\frac{{\partial u}_{1}}{\partial t} + \nabla_{c} \cdot {\vec{E}}_{1} = \sqrt{{G u}^{2}} (f + ζ) - R T \frac{\partial}{{\partial x}^{1}} (in p),$

si4_e (3)

$\frac{{\partial u}_{2}}{\partial t} + \nabla_{c} \cdot {\vec{E}}_{2} = - \sqrt{{G u}^{1}} (f + ζ) - R T \frac{\partial}{{\partial x}^{2}} (in p),$ $\frac{{\partial u}_{2}}{\partial t} + \nabla_{c} \cdot {\vec{E}}_{2} = - \sqrt{{G u}^{1}} (f + ζ) - R T \frac{\partial}{{\partial x}^{2}} (in p),$

si5_e (4)

$\frac{\partial}{\partial t} (\nabla p) - \nabla_{c} \cdot (U^{j} Δ p) = 0.$ $\frac{\partial}{\partial t} (\nabla p) - \nabla_{c} \cdot (U^{j} Δ p) = 0.$

si6_e (5)

$\frac{\partial}{\partial t} (Θ Δ p) + \nabla_{c} \cdot (U^{j} Θ p) = 0,$ $\frac{\partial}{\partial t} (Θ Δ p) + \nabla_{c} \cdot (U^{j} Θ p) = 0,$

si7_e (6)

$\frac{\partial}{\partial t} (q Δ p) + \nabla_{c} \cdot (U^{j} Δ p) = 0,$ $\frac{\partial}{\partial t} (q Δ p) + \nabla_{c} \cdot (U^{j} Δ p) = 0,$

si8_e (7)

where

$\begin{matrix} \nabla_{c} = (\frac{\partial}{{\partial x}^{1}}, \frac{\partial}{\partial x^{2}}), {\vec{E}}_{1} = (E, 0), {\vec{E}}_{2} = (0, E), E = Φ + \frac{1}{2} (u_{1} u^{1} + u_{2} u^{2}), \\ U^{j} - (u^{1}, u^{2}), Δ p = \sqrt{G} δ p, Θ - T {(p_{0} / p)}^{k}, k = R / C_{p}, \end{matrix}$ $\begin{matrix} \nabla_{c} = (\frac{\partial}{{\partial x}^{1}}, \frac{\partial}{\partial x^{2}}), {\vec{E}}_{1} = (E, 0), {\vec{E}}_{2} = (0, E), E = Φ + \frac{1}{2} (u_{1} u^{1} + u_{2} u^{2}), \\ U^{j} - (u^{1}, u^{2}), Δ p = \sqrt{G} δ p, Θ - T {(p_{0} / p)}^{k}, k = R / C_{p}, \end{matrix}$

si9_e (8)

where E is the energy term, ζ is the relative vortieity, Φ = gh is the geopotential height and f is the Coriolis parameter.

2.2 Vertical discretization

The vertical discretization follows the 1-D vertical Lagrangian coordinates of Starr [15] based on an ‘evolve and remap’ approach developed by Lin [8]. A terrain following Lagrangian vertical coordinate, as shown in Figure 2 (left), can be constructed by treating any reference Eulerian coordinate as a material surface. The Lagrangian surface are subject, to deform in the vertical direction during the integration, and need to be re-mapped onto a reference coordinate at regular intervals of time. By virtue of this approach, the hydrostatic atmosphere is vertically subdivided into a finite number of pressure intervals or pressure thicknesses. Moreover, the vertical coordinates and advection terms are absent thanks to the Lagrangian framework.

f27-02-9780444530356 — Figure 2 Lagrangian vertical coordinates system (left). The 3-D grid structure for the DG baroclinic model (right).

The entire 3-D system can be treated as a vertically stacked shallow water 2-D DG models, as demonstrated in Figure 2 (right), where the vertical levels are coupled only by the discretized hydrostatic relation. Therefore, vertical structures involve no parallel communications. Following Lin [8], at every time step δp is predicted at model levels and used to determine pressure at Lagrangian surfaces by summing the pressure thickness from top (p_x) to bottom $Φ_{ℓ} = Φ_{s} + Σ_{k - 1}^{ℓ} Δ Φ_{k}$ $Φ_{ℓ} = Φ_{s} + Σ_{k - 1}^{ℓ} Δ Φ_{k}$ The geopotential height at interfaces is obtained by using the hydrostatic relation, i.e., ΔΦ = −C_pΘΔII where II = (p/p₀)^κ , and summing the geopotenlial height from bottom (Φ_s) to top, $Φ_{ℓ} = Φ_{s} + Σ_{k - 1}^{ℓ} Δ Φ_{k}$ $Φ_{ℓ} = Φ_{s} + Σ_{k - 1}^{ℓ} Δ Φ_{k}$ . For the baroclinic model, the velocity fields (u₁, u₂), the moisture q, and total energy (Γ_E) are remapped onto the reference Eulerian coordinates using the 1-D conservative cell integrated semi-Lagrangian (CISL) method of Nair and Machenhauer [9]. The temperature field Θ is retrieved from the remapped total energy Γ_E.

2.3 DG discretization

The flux form of DG discretization can be formulated such that

$\frac{\partial}{\partial t} U+ \nabla_{c} \cdot \vec{F} (U) = S (U),$ $\frac{\partial}{\partial t} U+ \nabla_{c} \cdot \vec{F} (U) = S (U),$

si12_e (9)

where $U= {[u_{1}, u_{2}, \sqrt{G} δ p, Θ, q]}^{T}$ $U= {[u_{1}, u_{2}, \sqrt{G} δ p, Θ, q]}^{T}$ denotes prognostic variables, $\vec{F} (U)$ $\vec{F} (U)$ is flux function, and S (U) is source term. The corresponding weak Galcrkin formulation of 3-D DG model can be written such that

$\frac{\partial}{\partial t} \int_{Ω^{k}} U_{h} \cdot {\vec{ϕ}}_{h} {d Ω}^{k} = \int_{Ω^{k}} \vec{F} (U_{h}) \cdot \nabla_{c} {\vec{ϕ}}_{h} {d Ω}^{k} - \oint_{\partial Ω^{k}} (\vec{F} (U_{h}) \cdot \vec{n}) \cdot \int_{Ω^{k}} S (U_{h}) \cdot {\vec{ϕ}}_{h} {d Ω}^{k},$ $\frac{\partial}{\partial t} \int_{Ω^{k}} U_{h} \cdot {\vec{ϕ}}_{h} {d Ω}^{k} = \int_{Ω^{k}} \vec{F} (U_{h}) \cdot \nabla_{c} {\vec{ϕ}}_{h} {d Ω}^{k} - \oint_{\partial Ω^{k}} (\vec{F} (U_{h}) \cdot \vec{n}) \cdot \int_{Ω^{k}} S (U_{h}) \cdot {\vec{ϕ}}_{h} {d Ω}^{k},$

(10)

where the jump discontinuity at an element boundary requires the solution of a Riemann problem where the flux term $\vec{F} (U_{h}) \cdot \vec{n}$ $\vec{F} (U_{h}) \cdot \vec{n}$ can be approximated by a Lax-Friedrichs numerical flux as shown in [11]. The resulting DG discretization leads to following ordinary differential equation (ODE):

$\frac{d U_{h}}{d t} = L (U_{h}), U_{h} \in (0, T) \times Ω^{k} .$ $\frac{d U_{h}}{d t} = L (U_{h}), U_{h} \in (0, T) \times Ω^{k} .$

si17_e (11)

The above ODE can be solved by explicit time integration strategy such as the third-order Strong Stability Preserving Runge-Kutta (SSP-RK) scheme by Gottlib et al. [4].

$U_{h}^{(1)} = U_{h}^{(n)} + Δ t L (U_{h}^{(n)}),$ $U_{h}^{(1)} = U_{h}^{(n)} + Δ t L (U_{h}^{(n)}),$

(12)

$U_{h}^{(2)} = \frac{3}{4} U_{h}^{(n)} + \frac{1}{4} U_{h}^{(1)} + \frac{1}{4} Δ t L (U_{h}^{(1)}),$ $U_{h}^{(2)} = \frac{3}{4} U_{h}^{(n)} + \frac{1}{4} U_{h}^{(1)} + \frac{1}{4} Δ t L (U_{h}^{(1)}),$

si19_e (13)

$U_{h}^{(n + 1)} = \frac{1}{34} U_{h}^{(n)} + \frac{2}{3} U_{h}^{(2)} + \frac{2}{3} Δ t L (U_{h}^{(2)}) .$ $U_{h}^{(n + 1)} = \frac{1}{34} U_{h}^{(n)} + \frac{2}{3} U_{h}^{(2)} + \frac{2}{3} Δ t L (U_{h}^{(2)}) .$

si20_e (14)

3 NUMERICAL TEST

The baroclinic instability test proposed by Jablonowski and Williamson [6,7] is used to assess the evolution of an idealized baroclinic wave in the Northern Hemisphere. The baroclinic waves are triggered when overlaying the steady-state initial conditions with the zonal wind perturbation where as the initial conditions are given as quasi-realistic analytic expressions in [6,7]. Numerical computations are performed for the conservative 3-D DG model with 9^th order polynomials (i.e., N_v. = N_p = 10), horizontal resolution of 26 Lagrangian surfaces (i.e., N_lev = 26 where N_lev denotes the number of vertical levels) and total number of elements N_elem = 216. This case has 561,600 total degrees of freedoms (d.o.f). A Boyd-Vandeven filter in [1] is used for spatial filtering. Figure 3 demonstrates the triggering baroclinic waves and corresponding surface pressure P_s and temperature field T at 850 hPa from day 6 to day 10 . At day 6 the surface pressure shows few weak high and low pressure contours which leads to growth of very small-amplitude waves in the temperature field. At day 8 the baroclinic instability waves in surface pressure are well developed and the temperature waves can clearly be noticed. At day 10 the strong baroclinic pressure waves lead to two waves in the temperature field that have almost peaked and arc beginning to wrap around the trailing fronts.

f27-03-9780444530356 — Figure 3 Evolution of the baroclinic wave from integration day 6 to day 10: Surface pressure P_s [hPa] (top) and Temperature field [K] at 850 hPa (bottom).

4 PARALLEL IMPLEMENTATION

The parallel implementation of HOMME is based on a hybrid MPI/OpenMP approach and domain decomposition is applied through Hilbert space-filling curve approach, Sagan [14] and Dennis et al. [3]. The approach generates best partitions when N_e = 2^l3^m5ⁿ, where l, m, and n are integers. The first step to partitioning the computing grid involves the mapping of the 2-D surface of the cubed-sphere into a linear array. Figure 4 illustrates the Hilbert space-filling curve and elements when N_elem = 24. Then the second step involves partitioning the linear array into P contiguous groups, where P is the number of MPI tasks. The space-filling curve partitioning creates contiguous groups of elements and load-balances.

f27-04-9780444530356 — Figure 4 A mapping of a Hilbert space-filling curve for N_elem = 24 cubed-sphcre grid.

To perform parallel computing experiments, we uses the IBM Blue Gene/L (BG/L) and IBM POWER5 p575 systems at NCAR. The configuration of these systems is summarized in Table 1. A Message Passing Interface (MPI) job for IBM BG/L machine can be run in coprocessor mode (i.e., a single MPI task runs on each compute node) or in virtual-node mode (i.e., two MPI tasks are run on each compute node). On the other hand, 4 to 8 MPI tasks on each compute node arc performed for IBM POWER5 machine. To determine sustained MFLOPS per processor, the number of floating-point operations per time step was measured for the main DG time stepping loop using hardware performance counters for IBAI supercomputer.

Table 1

Comparison of IBM Blue Gcnc/L and IBM POWERS p575 systems.

Resource	IBM BLUE GENE/L	IBM POWERS P575
Clock cycle	0.7 GHz	1.9 GHz
Memory/proc	0.25 GB	2.0 GB
Total Processors	2048	624
Operating System	MK Linux	AIX 5.3
Compilers	IBM BGL XL	IBM AIX XL

• IBM Blue Genc/L system uses libmpihpm library and its link and code examples are given as follows:

add -L$(BGL_LIBRARY_PATH) -lmpihpm_f -lbgl_perfctr.rts

…

call trace_start()

call dg3d_advance()

call trace_stop()

• IBM POWERS p575 system uses libhpm library in HPM Toolkit and its link and code examples are given as follows:

add -L$(HPM_LIBRARY_PATH) -lhpm -lpmapi -lm

…

#include “f_hpm.h”

…

call f_hpminit(taskid,’dg3d’)

call f_hpmstart(5,’dg3d advance’)

call dg3d_advance()

call f_hpmstop(5)

call f_hpm_terminate(taskid)

…

Note that all writing and printing functions are turned off during performance evaluations. Figure 5 demonstrates IBM Blue Gene/L machine sustains between 253 to 266 MFLOPS per processor with coprocessor mode and sustains between 238 to 261 MFLOPS per processor with virtual-node mode where as IBM POWER5 machine sustains between 715 to 732 MFLOPS per processor with 4 tasks per node mode, sustains between 706 to 731 MFLOPS per processor with 6 tasks per node mode, and sustains between 532 to 561 MFLOPS per processor with 8 tasks per node mode. Table 2 summaries the percentage of peak performance for strong scaling results for IBM Blue Gene/L and IBM POWERS systems. IBM Blue Gene/L sustains 9.5%. and 9.3% of peak performance for coprocessor and virtual-node modes, respectively. However, IBM POWERS sustains 9.6% of peak performance for 4 and 6 tasks per node mode where as it sustains 7.4% of peak performance for 8 tasks per node mode. Note that the processors for IBM POWER5 system are grouped maximum 8 per node so that performance drops occur when full (i.e., 8) tasks per node have been used.

f27-05-9780444530356 — Figure 5 Parallel performance (i.e., strong scaling) results on IBM BG/L and IBM POWERS p575 systems.

Table 2

Summitry of strong scaling results for IBM Blue Gene/L and IBM P0WER5 p575 systems.

Resource	Sustained MFLOPS	% of Peak Performance
POWERS: 4 tasks/node	732	9.6
POWERS: 6 tasks/node	731	9.6
P0WER5: 8 tasks/node	561	7.1
BG/L: 1 task/node (CO)	206	9.5
BG/L: 2 tasks/node (VX)	261	9.3

5 CONCLUSION

A conservative 3-D DG baroclinic model has been developed in the NCAR HOMME framework. The 3-D DG model is formulated in conservative flux form. The computational domain is the singularity-free cubed-sphere geometry. The DG discretization uses high-order nodal basis set of Lagrange-Legendre polynomials and fluxes of inter-clement boundaries are approximated with Lax-Friedrichs numerical flux. The vertical discretization follows the 1-D vertical Lagrangian coordinates approach. Time integration follows the third-order SSP-RK scheme. To validate proposed 3-D DG model, the baroclinic instability test suite proposed by Jablonowski and Williamson is investigated. Currently, 3-D DG model performs successfully upto 10-day simulation. Parallel experiments are tested on IBM Blue Gene/L and IBM P0WER5 p575 supercomputers. Conservative 3-D DG baroclinic model sustains 9.5% peak performance for IBM Blue Gene/L’s coprocessor mode and sustains 9.6% peak performance for IBM POWERS’s 4 and 6 tasks per node modes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for A scalable high-order discontinuous Galerkin method for global atmospheric modeling

Create new playlist

Sign In

Sign Up