The CUDA Thrust library

We will now look at the CUDA Thrust Library. This library's central feature is a high-level vector container that is similar C++'s own vector container. While this may sound trivial, this will allow us to program in CUDA C with less reliance on pointers, mallocs, and frees. Like the C++ vector container, Thrust's vector container handles the resizing and concatenation of elements automatically, and with the magic of C++ destructors, freeing is also handled automatically when a Thrust vector object goes out of scope.

Thrust actually provides two vector containers: one for the host-side, and one for the device-side. The host-side Thrust vector is more or less identical to the STL vector, with the main difference being that it can interact more easily with the GPU. Let's write a little bit of code in proper CUDA C to get a feel for how this works. 

Let's start with the include statements. We'll be using the headers for both the host and device side vectors, and we'll also include the C++ iostream library, which will allow us to perform basic I/O operations on the Terminal:

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <iostream>

Let's just use the standard C++ namespace (this is so that we don't have to type in the std:: resolution operator when checking the output):

using namespace std;

We will now make our main function and set up an empty Thrust vector on the host side. Again, these are C++ templates, so we have to choose the datatype upon declaration with the < > brackets. We will set this up to be an array of integers:

int main(void)
{
thrust::host_vector<int> v;

Now, let's append some integers to the end of v by using push_back, exactly how we would do so with a regular STL vector:

v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);

We will now iterate through all of the values in the vector, and output each value:

The output here should be v[0] == 1 through v[3] == 4.
for (int i = 0; i < v.size(); i++)
cout << "v[" << i << "] == " << v[i] << endl;

This may have seemed trivial so far. Let's set up a Thrust vector on the GPU and then copy the contents from v:

thrust::device_vector<int> v_gpu = v;

Yes, that's all—only one line, and we're done. All of the content of v on the host will now be copied to v_gpu on the device! (If this doesn't amaze you, please take another look at Chapter 6, Debugging and Profiling Your CUDA Code, and think about how many lines this would have taken us before.)

Let's try using push_back on our new GPU vector, and see if we can concatenate another value to it:

v_gpu.push_back(5);

We will now check the contents of v_gpu, like so:

for (int i = 0; i < v_gpu.size(); i++)
std::cout << "v_gpu[" << i << "] == " << v_gpu[i] << std::endl;
This part should output v_gpu[0] == 1 through v_gpu[4] == 5.

Again, thanks to the destructors of these objects, we don't have to do any cleanup in the form of freeing any chunks of allocated memory. We can now just return from the program, and we are done:

    return 0;
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.239.234