Footnotes

Chatper 1

1 A. P. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, and R. W. Brodersen, “Optimizing Power Using Transformations,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 14, no. 1 (January 1995): 12–31.

2 Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture (April 2008).

3 Technical Brief, NVIDIA GeForce GTX 200 GPU Architectural Overview, TB-04044-001_v01 (May 2008).

4 T. G. Mattson, R. van der Wijngaart, and M. Frumkin, “Programming Intel’s 80 Core Terascale Processor,” Proceedings of SC08, Austin, TX (November 2008).

5 T. G. Mattson, B. A. Sanders, and B. L. Massingill, Patterns for Parallel Programming, Design Patterns series (Addison-Wesley, 2004).

Chatper 3

1 The pattern for querying device information, using clGetDeviceInfo(), is the same as that used for platforms and in fact matches that for all OpenCL clGetXXInfo() functions. The remainder of this book will not repeat the details of how to query the size of a value returned from the clGetXXInfo() operation.

2 For simplicity, the example in Listing 3.3 admits the handling of the case when clDeviceInfo() returns an array of values. This is easily handled by providing a small array template and specializing the template InfoDevice; the complete implementation is provided as a source with the book’s accompanying examples.

3 The exception to this rule is for OpenCL platforms that do not have corresponding retain/release calls.

4 For simplicity, edge cases are not considered; a more realistic convolution example can be found in Chapter 11.

Chatper 4

1 Unless the double-precision extension (cl_khr_fp64) is supported by the device.

2 Unless the half-precision extension (cl_khr_fp16) is supported by the device.

3 Some fiddling with compiler flags to get the vector extensions turned on may be required, for example, -msse2 or -faltivec. You might need to play with the #ifs. The problem is that there is no portable way to declare a vector type. Getting rid of the sort of portability headaches at the top of the code example is one of the major value-adds of OpenCL.

4 Unless the half-precision extension (cl_khr_fp16) is supported.

5 ulp(x) is the gap between two finite floating-point numbers. A detailed description of ulp(x) is given in Chapter 5 in the section “Math Functions,” subsection “Relative Error as ulps.”

6 Unless the double-precision extension (cl_khr_fp64) is supported.

7 Unless the half-precision extension (cl_khr_fp16) is supported.

Chatper 5

1 The math.h header does not need to be included in the OpenCL kernel.

2 This definition of ulp was taken with consent from Jean-Michel Muller with slight clarification for the behavior of zero. Refer to ftp://ftp.inria.fr/INRIA/publication/publi-pdf/RR/RR-5504.pdf.

Chatper 7

1 While it is technically feasible to define sub-buffers of sub-buffers, the OpenCL specification does not allow this because of concerns that implementations would have to be constructive with respect to optimizations due to potential aliasing of a buffer.

Chatper 16

1 R. Pienaar, B. Fischl, V. Caviness, N. Makris, and P. E. Grant, “Methodology for Analyzing Curvature in the Developing Brain from Preterm to Adult,” International Journal of Imaging Systems and Technology 18, no. 1 (June 1, 2008): 42–68. PMID: 19936261. PMCID: PMC2779548.

2 Pawan Harish and P. J. Narayanan, “Accelerating Large Graph Algorithms on the GPU Using CUDA,” IEEE High Performance Computing (2007).

Chatper 17

1 Figure 17.1 appears in full color in the online version of this chapter.

2 The colors are shown as different shades of gray in the printed version and appear in full color in the online version of this chapter.

3 Again the colors are shown as different shades of gray in the printed version and appear in full color in the online version of this chapter.

Chatper 18

1 Jerry Tessendorf, “Simulating Ocean Water,” SIGGRAPH Course Notes (2002).

2 If you are reading the online version of this chapter, you are lucky enough to see Figure 18.1 in full color, too.

3 Jerry Tessendorf, “Simulating Ocean Water,” SIGGRAPH 1999, http://graphics.ucsd.edu/courses/rendering/2005/jdewall/tessendorf.pdf.

4 E. Brigham, The Fourier Transform and Its Applications (Prentice Hall, 1988).

5 C. Van Loan, Computational Frameworks for the Fast Fourier Transform (Society for Industrial Mathematics, 1987); E. Chu and A. George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms (CRC Press, 1999).

Chatper 19

1 J. Y. Bouguet, “Pyramidal Implementation of the Lucas Kanade Feature Tracker,” Intel Corporation Microprocessor Research Labs (2000).

2 Simon Baker et al., “A Database and Evaluation Methodology for Optical Flow,” International Journal of Computer Vision 92, no. 1 (March 2011): 1–31.

Chatper 22

1 Nathan Bell and Michael Garland, “Efficient Sparse Matrix-Vector Multiplication on CUDA,” NVIDIA Technical Report NVR-2008-004 (December 2008), www.nvidia.com/object/nvidia_research_pub_001.html.

2 Muthu Manikandan Baskaran and Rajesh Bordawekar, “Optimizing Sparse Matrix-Vector Multiplication on GPUs”, RC24704 (2008), http://domino.watson.ibm.com/library/CyberDig.nsf/1e4115aea78b6e7c85256b360066f0d4/1d32f6d23b99f7898525752200618339.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.2.240