Summary

In this chapter, we looked at the implementation of commonly used algorithms and patterns in CUDA. These algorithms and patterns are commonly available. We covered basic optimization techniques in matrix multiplication and convolution filtering. Then, we expanded our discussion on how to parallelize the problem by using prefix sum, N-body, histogram, and sorting. To do this, we have used dedicated GPU knowledge, libraries, and lower-level primitives. 

Many of the algorithms we have covered are implemented in CUDA libraries. For example, matrix multiplication is in the cuBLAS library, while convolution is in the CUDNN library. In addition, we have covered two approaches in the radix sort implementation: using the Thrust library or warp-level primitives for histogram computation.

Now that you've seen how these patterns can be implemented in commonly used libraries, the next logical step is to see how we can use these libraries. This is what we will be doing in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.138.144