Questions

  1. Change the random vector in simple_scalar_multiply_kernel.py so that it is of a length of 10,000, and modify the i index in the definition of the kernel so that it can be used over multiple blocks in the form of a grid. See if you can now launch this kernel over 10,000 threads by setting block and grid parameters to something like block=(100,1,1) and grid=(100,1,1).
  2. In the previous question, we launched a kernel that makes use of 10,000 threads simultaneously; as of 2018, there is no NVIDIA GPU with more than 5,000 cores. Why does this still work and give the expected results?
  3. The naive parallel prefix algorithm has time complexity O(log n) given that we have n or more processors for a dataset of size n. Suppose that we use a naive parallel prefix algorithm on a GTX 1050 GPU with 640 cores. What does the asymptotic time complexity become in the case that n >> 640?
  4. Modify naive_prefix.py to operate on arrays of arbitrary size (possibly non-dyadic), only bounded by 1,024.
  5. The __syncthreads() CUDA device function only synchronizes threads across a single block. How can we synchronize across all threads in all blocks across a grid?
  6. You can convince yourself that the second prefix sum algorithm really is more work-efficient than the naive prefix sum algorithm with this exercise. Suppose that we have a dataset of size 32. What is the exact number of "addition" operations required by the first and second algorithm in this case?
  7. In the implementation of the work-efficient parallel prefix we use a Python function to iterate our kernels and synchronize the results. Why can't we just put a for loop inside the kernels with careful use of __syncthreads() instead?
  8. Why does it make more sense to implement the naive parallel prefix within a single kernel that handles its own synchronization within CUDA C, than it makes more sense to implement the work-efficient parallel prefix using both kernels and Python functions and have the host handle the synchronization?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.37.196