Copyright © 2011 NVIDIA Corporation and Wen-mei W. Hwu. All rights reserved.
Introduction
The State of GPU Computing in Scientific Simulation
GPU computing is revolutionizing scientific simulation by providing one to two orders of magnitude of increased computing performance per GPU at price points even students can afford. Exciting things are happening with this technology in the hands of the masses, as reflected by the applications, CUDA Gems, and the extraordinary number of papers that have appeared in the literature since CUDA was first introduced in February 2007.
Technology that provides two or more orders of magnitude of increased computational capability is disruptive and has the potential to fundamentally affect scientific research by removing time-to-discovery barriers. I cannot help getting excited by the potential as simulations that previously would have taken a year or more to complete can now be finished in days. Better scientific insight also becomes possible because researchers can work with more data and have the ability to utilize more accurate, albeit computationally expensive, approximations and numerical methods. We are now entering the era where hybrid clusters and supercomputers containing large numbers of GPUs are being built and used around the world. As a result, many researchers (and funding agencies) now have to rethink their computational models and invest in software to create scalable, high-performance applications based on this technology. The potential is there, and some lucky researchers may find themselves with a Galilean first opportunity to see, study, and model using exquisitely detailed data from projects utilizing GPU technology and these hybrid systems.
In this Section
The chapters in this section provide gems of insight both in thought and CUDA implementation to map challenging scientific simulation problems to GPU technology. Techniques to work with irregular grids, dynamic surfaces, treecodes, and far-field calculations are presented. All of these CUDA gems can be adapted and should provide food for thought in solving challenging computational problems in many areas. Innovative solutions are discussed, including just-in-time (JIT) compilation; appropriate and effective use of fast on-chip GPU memory resources across GPU technology generations; the application of texture unit arithmetic to augment GPU computational and global memory performance; and the creation of solutions that can scale across multiple GPUs in a distributed environment. General kernel optimization principles are also provided in many chapters. Some of the kernels presented require fewer than 200 lines of CUDA code, yet still provide impressive performance.
In Chapter 1 : Evaluating molecular orbitals on 3-D lattices is a common problem in molecular visualization. This chapter discusses the design trade-offs in the popular VMD (visual molecular dynamics) software system plus the appropriate and effective use of fast on-chip GPU memory resources across various generations of GPUs. Several kernel optimization principles are provided. To account for varying problem size and GPU performance regimes, an innovative just-in-time (JIT) kernel compilation technique is utilized.
In Chapter 2 : The authors discuss the techniques they used to adapt the LIGO string similarity algorithm to run efficiently on GPUs and avoid the memory bandwidth and conditional operations that limit parallelism in the CPU implementation. These techniques as well as the discussion on minimizing CPU-GPU transfer overhead and exploiting thread level parallelism should benefit readers in many areas; not just those interested in large scale chemical informatics.
In Chapter 3 : This chapter discusses a GPU-accelerated dynamic quadrature grid method where the grid points move over the course of the calculation. The merits of several parallelization schemes, mixed precision arithmetic as an optimization technique, and problems arising from branching within a warp are discussed.
In Chapter 4 : GPU kernels are presented that calculate electrostatic potential maps on structured grids containing a large amount of fine-grained data parallelism. Approaches to regularize the computation work are discussed along with kernel loop optimizations and implementation notes on how to best use the GPU memory subsystem. All of this is phrased in the context of the popular VMD (visual molecular dynamics) and APBS (Adaptive Poisson-Boltzmann Solver) software packages.
In Chapter 5 : Direct molecular dynamics (MD) requires repeated calculation of the potential energy surface obtained from electronic structure calculations. This chapter shows how this calculation can be rethought to propagate the electronic structure without diagonalization — a time-consuming step that is difficult to implement on GPUs. Other topics discussed include efficiently using CUBLAS and the integration of CUDA within a FORTRAN framework.
In Chapter 6 : Irregular tree-based data structures are a challenge given the GPGPU memory subsystem likes coalesced memory accesses. This chapter describes a number of techniques — both novel and conventional — to reduce main memory accesses on an irregular tree-based data structure. All the methods run on the GPU.
In Chapter 7 : The GRASSY spectral synthesis platform is described, which utilizes GPUs to address the computational needs of asteroseismology. In particular, this chapter demonstrates an innovative use of interpolation by CUDA texture memory to augment arithmetic performance and reduce memory access overhead. The low precision of texture memory arithmetic is discussed and shown to not affect solution accuracy. Mesh building and rasterization are also covered.
In Chapter 8 : Exploring the parameter space of a complex dynamical system is an important facet of scientific simulation. Many problems require integration of a coupled set of ordinary differential equations (ODEs). Rather than parallelizing a single integration, the authors use CUDA to turn the GPU into a survey engine that performs many integrations at once. With this technology, scientists can examine more of the phase space of the problem to gain a better understanding of the dynamics of the simulation. In the case of black holes in spirals, GPU technology might have a significant impact in the quest for direct measurement of gravity waves. Robustness across GPUs in a distributed MPI environment is also discussed.
In Chapter 9 : As this chapter shows, constructing fast N -body algorithms is far from a formidable task. Basic kernels are discussed that achieve substantial speedups (15x to 150x) in fewer than 200 lines of CUDA code. These same kernels extend previous GPU gems N -body CUDA mappings to encompass parallel far-field approximations that are useful for astrophysics, acoustics, molecular dynamics, particle simulation, electromagnetics, and boundary integral formulations. Other topics include structuring the data to preserve coalesced memory accesses and balancing parallelism and data reuse through the use of tiles.
In Chapter 10 : The authors discuss the GPU-specific thought and implementation details for BigDFT, a massively parallel implementation of a full DFT (density functional theory) code for quantum chemistry that runs on hybrid clusters and supercomputers containing many GPUs. From the unconventional use of Daubechies wavelets, which are well suited for GPU-accelerated environments, the authors progress to a discussion of scalability and integration in a distributed runtime environment.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.186.167