Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Experiment 1 – creating multiple blocks

In this section, we will make use of CUDA blocks to run the vector addition code in parallel on the GPU. Additional keywords will be exposed that are related to how we can index CUDA blocks. Change the call to the device_add function, as follows:

//changing from device_add<<<1,1>>> to
device_add<<<N,1>>>

This will execute the device_add function N times in parallel instead of once. Each parallel invocation of the device_add function is referred to as a block. Now, let's add a __global__ device function, as follows:

__global__ void device_add(int *a, int *b, int *c) {
 c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x];
}

By using blockIdx.x to index the array, each block handles a different element of the array. On the device, each block can execute in parallel. Let's take a look at the following screenshot:

The preceding screenshot represents the vector addition GPU code in which every block shows indexing for multiple single-thread blocks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.144.172.115

Table of Contents for Experiment 1 &#x2013; creating multiple blocks

Create new playlist

Sign In

Sign Up

Table of Contents for
Experiment 1 – creating multiple blocks