Listing 2.1 HelloWorld OpenCL Kernel and Main Function 46
Listing 2.2 Choosing a Platform and Creating a Context 49
Listing 2.3 Choosing the First Available Device and Creating a Command-Queue 51
Listing 2.4 Loading a Kernel Source File from Disk and Creating and Building a Program Object 53
Listing 2.5 Creating a Kernel 54
Listing 2.6 Creating Memory Objects 55
Listing 2.7 Setting the Kernel Arguments, Executing the Kernel, and Reading Back the Results 56
Listing 3.1 Enumerating the List of Platforms 66
Listing 3.2 Querying and Displaying Platform-Specific Information 67
Listing 3.3 Example of Querying and Displaying Platform-Specific Information 79
Listing 3.4 Using Platform, Devices, and Contexts—Simple Convolution Kernel 90
Listing 3.5 Example of Using Platform, Devices, and Contexts—Simple Convolution 91
Listing 6.1 Creating and Building a Program Object 221
Listing 6.2 Caching the Program Binary on First Run 229
Listing 6.3 Querying for and Storing the Program Binary 230
Listing 6.4 Example Program Binary for HelloWorld.cl
(NVIDIA) 233
Listing 6.5 Creating a Program from Binary 235
Listing 7.1 Creating, Writing, and Reading Buffers and Sub-Buffers Example Kernel Code 262
Listing 7.2 Creating, Writing, and Reading Buffers and Sub-Buffers Example Host Code 262
Listing 8.1 Creating a 2D Image Object from a File 284
Listing 8.2 Creating a 2D Image Object for Output 285
Listing 8.3 Query for Device Image Support 291
Listing 8.4 Creating a Sampler Object 293
Listing 8.5 Gaussian Filter Kernel 295
Listing 8.6 Queue Gaussian Kernel for Execution 297
Listing 8.7 Read Image Back to Host Memory 300
Listing 8.8 Mapping Image Results to a Host Memory Pointer 307
Listing 12.1 Vector Add Example Program Using the C++ Wrapper API 379
Listing 13.1 Querying Platform and Device Profiles 384
Listing 14.1 Sequential Implementation of RGB Histogram 393
Listing 14.2 A Parallel Version of the RGB Histogram—Compute Partial Histograms 395
Listing 14.3 A Parallel Version of the RGB Histogram—Sum Partial Histograms 397
Listing 14.4 Host Code of CL API Calls to Enqueue Histogram Kernels 398
Listing 14.5 A Parallel Version of the RGB Histogram—Optimized Version 400
Listing 14.6 A Parallel Version of the RGB Histogram for Half-Float and Float Channels 403
Listing 15.1 An OpenCL Sobel Filter 408
Listing 15.2 An OpenCL Sobel Filter Producing a Grayscale Image 410
Listing 16.1 Data Structure and Interface for Dijkstra’s Algorithm 413
Listing 16.2 Pseudo Code for High-Level Loop That Executes Dijkstra’s Algorithm 414
Listing 16.3 Kernel to Initialize Buffers before Each Run of Dijkstra’s Algorithm 415
Listing 16.4 Two Kernel Phases That Compute Dijkstra’s Algorithm 416
Listing 20.1 ImageFilter2D.py
489
Listing 20.2 Creating a Context 492
Listing 20.3 Loading an Image 494
Listing 20.4 Creating and Building a Program 495
Listing 20.5 Executing the Kernel 496
Listing 20.6 Reading the Image into a Numpy Array 496
Listing 21.1 A C Function Implementing Sequential Matrix Multiplication 500
Listing 21.2 A kernel to compute the matrix product of A and B summing the result into a third matrix, C. Each work-item is responsible for a single element of the C matrix. The matrices are stored in global memory 501
Listing 21.3 The Host Program for the Matrix Multiplication Program 503
Listing 21.4 Each work-item updates a full row of C. The kernel code is shown as well as changes to the host code from the base host program in Listing 21.3. The only change required in the host code was to the dimensions of the NDRange 507
Listing 21.5 Each work-item manages the update to a full row of C, but before doing so the relevant row of the A matrix is copied into private memory from global memory 508
Listing 21.6 Each work-item manages the update to a full row of C. Private memory is used for the row of A and local memory (Bwrk
) is used by all work-items in a work-group to hold a column of B. The host code is the same as before other than the addition of a new argument for the B-column local memory 510
Listing 21.7 Different Versions of the Matrix Multiplication Functions Showing the Permutations of the Loop Orderings 513
Listing 22.1 Sparse Matrix-Vector Multiplication OpenCL Kernels 530
3.144.30.236