Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The pursuit of better performance

Our implementation code performs the optimal operation. However, we can make further optimizations by reducing the shared memory's bank conflicts. In our implementation, the CUDA threads access the same memory banks at certain points. NVIDIA's GPU Gem3 introduced prefix-sum (scan) in Chapter 39, Parallel Prefix Sum (Scan) with CUDA (https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html), and points out this issue in 39.2.3 Avoiding Bank Conflicts. You can adapt the solution to our implementation, but you should update NUM_BANKS to 32 and LOG_NUM_BANKS to 5 if you do. Nowadays, the CUDA architecture has 32 shared memory banks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.117.182.179

Table of Contents for The pursuit of better performance

Create new playlist

Sign In

Sign Up

Table of Contents for
The pursuit of better performance