Data directive

The OpenACC parallel model states that we have a host, which runs our sequential code (mostly it would be a CPU). Then we have our device, which is some sort of parallel hardware. The host and device usually (though not always) have separate memories, and the programmer can use OpenACC to move data between the two memories.

As discussed in the first chapter, the GPU and CPU architectures are fundamentally different. The GPU, being a throughput-based architecture, has a high number of computational units along with high-speed memory bandwidth. The CPU, on the other hand, is a latency-reducing architecture, has a large cache hierarchy, and also provides a large main memory. Any data that needs to be operated on needs to be first copied to the GPU memory. (Note that even in the case of unified memory the data gets copied behind the scenes in the form of pages by the driver.)

As illustrated in the following diagram, the data transfer between the two architectures (CPU and GPU) happens via an I/O bus:

Our goal when using a GPU as the target architecture in OpenACC is to only use it to offload our parallel code, and the sequential code will continue to run on our CPU. The OpenACC standard allows the programmer to explicitly define data management by using the OpenACC data directive and data clauses. Data clauses allow the programmer to specify data transfers between the host and device (or, in our case, the CPU and the GPU).

Implicit data management: We can leave the transfer of data to the compiler as shown in the following example:

int *A = (int*) malloc(N * sizeof(int));

#pragma acc parallel loop
for( int i = 0; i < N; i++ )
{
A[i] = 0;
}

In the preceding code, the compiler will understand that the A vector needs to be copied from the GPU, and generate an implicit transfer for the developer. 

Explicit data management: It is good practice to make use of explicit data transfers to gain more control over the transfers, as shown in the following code where we are using the copy data clause:

int *a = (int*) malloc(N * sizeof(int));
#pragma acc parallel loop copy(a[0:N])
for( int i = 0; i < N; i++ )
{
a[i] = 0;
}

In the preceding code snippet we make use of the copy data clause. The following diagram explains the steps executed when runtime reached the copy data directive:

We will go into the details of these steps with the help of the merge code where we will be applying the data clauses.

Other available data clauses are as listed as follows: 

Data clause

Description

Key usage

copy(list)

  • Allocates memory on the device

  • Copies data from the host to the device when entering the region

  • Copies data to the host when exiting the region

This is the default for input data structures that are modified and then returned from function

copyin(list)

  • Allocates memory on the device

  • Copies data from the host to the device when entering the region

Vectors that are just input to a subroutine

copyout(list)

  • Allocates memory on the device

  • Copies data to the host when exiting the region

A result that doesn't overwrite the input data structure

create(list)

  • Only allocates memory on the device

  • No copy is made

Temporary arrays

 

To maximize performance, the programmer should avoid all unnecessary data transfers, and hence explicit memory management is preferred over implicit data management.

Array shaping: Array shaping is how you specify the size of the array. If you do not specify a shape, then the compiler will try to assume the size. This works well in Fortran, since Fortran tracks the size of the array; however, it will most likely not work in C/C++. Array shaping is also the only way to copy a portion of data from the array (for example, if you only need to copy half of the array, this can be a performance boost, cutting out unnecessary copies), as shown in the following code snippet:

#pragma acc parallel loop copy(A[1:N-2])

This would copy all of the elements of A except for the first and last elements. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.44.108