Experiment 1 – creating multiple blocks

In this section, we will make use of CUDA blocks to run the vector addition code in parallel on the GPU. Additional keywords will be exposed that are related to how we can index CUDA blocks. Change the call to the device_add function, as follows:

//changing from device_add<<<1,1>>> to
device_add<<<N,1>>>

This will execute the device_add function N times in parallel instead of once. Each parallel invocation of the device_add function is referred to as a block. Now, let's add a __global__ device function, as follows:

__global__ void device_add(int *a, int *b, int *c) {
c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x];
}

By using blockIdx.x to index the array, each block handles a different element of the array. On the device, each block can execute in parallel. Let's take a look at the following screenshot:

The preceding screenshot represents the vector addition GPU code in which every block shows indexing for multiple single-thread blocks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.172.115