- SBLAH starts with an S, so this function uses 32-bit real floats. ZBLEH starts with a Z, which means it works with 128-bit complex floats.
- Hint: set trans = cublas._CUBLAS_OP['T']
- Hint: use the Scikit-CUDA wrapper to the dot product, skcuda.cublas.cublasSdot
- Hint: build upon the answer to the last problem.
- You can put the cuBLAS operations in a CUDA stream and use event objects with this stream to precisely measure the computation times on the GPU.
- Since the input appears as being complex to cuFFT, it will calculate all of the values as NumPy.
- The dark edge is due to the zero-buffering around the image. This can be mitigated by mirroring the image on its edges rather than by using a zero-buffer.