1 GPUs in Support of Parallel Computing
2 A quick introduction to GPUs
3 Correctness issues in GPU programming
4 The need for effective tools
Chapter 2: SnuCL: A unified OpenCL framework for heterogeneous clusters
4 Memory management in SnuCL Cluster
Chapter 3: Thread communication and synchronization on massively parallel GPUs
2 Coarse-Grained Communication and Synchronization
3 Built-In Atomic Functions on Regular Variables
4 Fine-Grained Communication and Synchronization
5 Conclusion and Future Research Direction
Chapter 4: Software-level task scheduling on GPUs
1 Introduction, Problem Statement, and Context
2 Nondeterministic behaviors caused by the hardware
4 Scheduling-enabled optimizations
5 Other scheduling work on GPUs
Chapter 5: Data placement on GPUs
3 Memory specification through MSL
Part 2: Algorithms and Applications
Chapter 6: Biological sequence analysis on GPU
2 Pairwise Sequence Comparison and Sequence-Profile Comparison
3 Design aspects of GPU solutions for biological sequence analysis
4 GPU Solutions for Pairwise Sequence Comparison
5 GPU Solutions for Sequence-Profile Comparison
Chapter 7: Graph algorithms on GPUs
1 Graph representation for GPUs
2 Graph traversal algorithms: the breadth first search (BFS)
3 The single-source shortest path (SSSP) problem
5 Load Balancing and Memory Accesses: Issues and Management Techniques
Chapter 8: GPU alignment of two and three sequences
4 Alignment of three sequences
Chapter 9: Augmented Block Cimmino Distributed Algorithm for solving tridiagonal systems on GPU
2 ABCD Solver for tridiagonal systems
3 GPU implementation and optimization
Chapter 10: GPU computing applied to linear and mixed-integer programming
2 Operations Research in Practice
3 Exact Optimization Algorithms
Chapter 11: GPU-accelerated shortest paths computations for planar graphs
4 Computational Complexity Analysis
Chapter 12: GPU sorting algorithms
2 Generic Programming Strategies for GPU
Chapter 13: MPC: An effective floating-point compression algorithm for GPUs
Chapter 14: Adaptive sparse matrix representation for efficient matrix-vector multiplication
2 Sparse matrix-vector multiplication
3 GPU architecture and programming model
4 Optimization principles for SpMV
5 Platform (Adaptive Runtime System)
Part 3: Architecture and Performance
Chapter 15: A framework for accelerating bottlenecks in GPU execution with assist warps
5 A Case for CABA: Data Compression
8 Other Uses of the CABA Framework
Chapter 16: Accelerating GPU accelerators through neural algorithmic transformation
2 Neural transformation for GPUs
3 Instruction-set-architecture design
4 Neural accelerator: design and integration
5 Controlling quality trade-offs
3 The Need for Heterogeneous Interconnections
4 Characterization of GPGPU Performance
Chapter 18: Accurately modeling GPGPU frequency scaling with the CRISP performance model
3 GPGPU DVFS performance model
Chapter 19: Energy and power considerations of GPUs
3 Power profiling of regular and irregular programs
4 Affecting power and energy on GPUs
Chapter 20: Architecting the last-level cache for GPUs using STT-MRAM nonvolatile memory
4 Two-Part L2 Cache Architecture
5 Dynamic Write Threshold Detection Mechanism
Chapter 21: Power management of mobile GPUs
2 GPU Power Management for Mobile Games
3 GPU Power Management for GPGPU Applications
Chapter 22: Advances in GPU reliability research
3 Hardware Reliability Enhancements
4 Software Reliability Enhancements
Chapter 23: Addressing hardware reliability challenges in general-purpose GPUs
3 Modeling and Characterizing GPGPUs Reliability in the Presence of Soft Errors [25]
4 RISE: Improving the Streaming Processors’ Reliability Against Soft Errors in GPGPUs [36]
3.147.205.154