Summary

We first saw how to query our GPU from PyCUDA, and with this re-create the CUDA deviceQuery program in Python. We then learned how to transfer NumPy arrays to and from the GPU's memory with the PyCUDA gpuarray class and its to_gpu and get functions. We got a feel for using gpuarray objects by observing how to use them to do basic calculations on the GPU, and we learned to do a little investigative work using IPython's prun profiler. We saw there is sometimes some arbitrary slowdown when running GPU functions from PyCUDA for the first time in a session, due to PyCUDA launching NVIDIA's nvcc compiler to compile inline CUDA C code. We then saw how to use the ElementwiseKernel function to compile and launch element-wise operations, which are automatically parallelized onto the GPU from Python. We did a brief review of functional programming in Python (in particular the map and reduce functions), and finally, we covered how to do some basic reduce/scan-type computations on the GPU using the InclusiveScanKernel and ReductionKernel functions.

Now that we have the absolute basics down about writing and launching kernel functions, we should realize that PyCUDA has covered the vast amount of the overhead in writing a kernel for us with its templates. We will spend the next chapter learning about the principles of CUDA kernel execution, and how CUDA arranges concurrent threads in a kernel into abstract grids and blocks.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary