1.2 The Age of Parallel Processing
1.2.1 Central Processing Units
1.4.1 What Is the CUDA Architecture?
1.4.2 Using the CUDA Architecture
1.5.2 Computational Fluid Dynamics
2.2.1 CUDA-Enabled Graphics Processors
2.2.3 CUDA Development Toolkit
4 PARALLEL PROGRAMMING IN CUDA C
5.2.2 GPU Ripple Using Threads
5.3 Shared Memory and Synchronization
5.3.2 Dot Product Optimized (Incorrectly)
6.2.1 Ray Tracing Introduction
6.2.3 Ray Tracing with Constant Memory
6.2.4 Performance with Constant Memory
6.3 Measuring Performance with Events
6.3.1 Measuring Ray Tracer Performance
7.3.2 Computing Temperature Updates
7.3.3 Animating the Simulation
7.3.5 Using Two-Dimensional Texture Memory
8.3 GPU Ripple with Graphics Interoperability
8.3.1 The GPUAnimBitmap Structure
8.4 Heat Transfer with Graphics Interop
9.2.1 The Compute Capability of NVIDIA GPUs
9.2.2 Compiling for a Minimum Compute Capability
9.3 Atomic Operations Overview
9.4.1 CPU Histogram Computation
9.4.2 GPU Histogram Computation
10.4 Using a Single CUDA Stream
10.5 Using Multiple CUDA Streams
10.7 Using Multiple CUDA Streams Effectively
12.2.4 NVIDIA GPU Computing SDK
12.2.5 NVIDIA Performance Primitives
12.3.1 Programming Massively Parallel Processors: A Hands-On Approach
12.4.1 CUDA Data Parallel Primitives Library
A.1.2 Dot Product Redux: Atomic Locks
18.118.12.157