Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

In this chapter, we looked at the implementation of commonly used algorithms and patterns in CUDA. These algorithms and patterns are commonly available. We covered basic optimization techniques in matrix multiplication and convolution filtering. Then, we expanded our discussion on how to parallelize the problem by using prefix sum, N-body, histogram, and sorting. To do this, we have used dedicated GPU knowledge, libraries, and lower-level primitives.

Many of the algorithms we have covered are implemented in CUDA libraries. For example, matrix multiplication is in the cuBLAS library, while convolution is in the CUDNN library. In addition, we have covered two approaches in the radix sort implementation: using the Thrust library or warp-level primitives for histogram computation.

Now that you've seen how these patterns can be implemented in commonly used libraries, the next logical step is to see how we can use these libraries. This is what we will be doing in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.138.138.144

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary