Questions

Suppose that you use nvcc to compile a single .cu file containing both host and kernel code into an EXE file, and also into a PTX file. Which file will contain the host functions, and which file will contain the GPU code?
Why do we have to destroy a context if we are using the CUDA Driver API?
At the beginning of this chapter when we first saw how to use Ctypes, notice that we had to typecast the floating point value 3.14 to a Ctypes c_double object in a call to printf before it would work. Yet we can see many working cases of not typecasting to Ctypes in this chapter. Why do you think printf is an exception here?
Suppose you want to add functionality to our Python CUDA Driver interface module to support CUDA streams. How would you represent a single stream object in Ctypes?
Why do we use extern "C" for functions in mandelbrot.cu?
Look at mandelbrot_driver.py again. Why do we not use the cuCtxSynchronize function after GPU memory allocations and host/GPU memory transfers, and only after the single kernel invocation?

Table of Contents for Questions