Stream execution with priorities

Now, we will reuse the previous multi-stream application with the callback. In this code, we can see that the streams can operate in order, and we will see how this order can be changed with priorities. We will make a derived class from the Operator class, and it will handle the priority of the stream. So, we change the member variable stream's protection level from the private member to the protected member. And, the constructor can create the stream optionally since that can be done by the derived class. The change is shown with the following code:

... { middle of the class Operator } ...
protected:
cudaStream_t stream = nullptr;

public:
Operator(bool create_stream = true) {
if (create_stream)
cudaStreamCreate(&stream);
sdkCreateTimer(&p_timer);
}
... { middle of the class Operator } ...

The derived class, Operator_with_priority, will have a function that creates a CUDA stream manually with the given priority. That class configuration is as follows:

class Operator_with_priority: public Operator {
public:
Operator_with_priority() : Operator(false) {}

void set_priority(int priority) {
cudaStreamCreateWithPriority(&stream,
cudaStreamNonBlocking, priority);
}
};

As we handle each stream's operation with the class, we will update the ls_operator creation code to use the Operator_with_priority class in main(), to use the class we wrote before, as follows:

Operator_with_priority *ls_operator = new Operator_with_priority[num_operator];

As we update the class, this class does not create streams before we request it to do so. As we discussed before, we need to obtain the available range of priority of the GPU using the following code:

// Get priority range
int priority_low, priority_high;
cudaDeviceGetStreamPriorityRange(&priority_low, &priority_high);
printf("Priority Range: low(%d), high(%d) ", priority_low, priority_high);

Then, let's create each operation to have different prioritized streams. To ease this task, we will let the last operation have the highest stream, and see how preemption in CUDA streams works. This can be done with the following code:

for (int i = 0; i < num_operator; i++) {
ls_operator[i].set_index(i);

// let the latest CUDA stream to have the high priority
if (i + 1 == num_operator)
ls_operator[i].set_priority(priority_high);
else
ls_operator[i].set_priority(priority_low);
}

After that, we will execute each operation, as we did previously:

for (int i = 0 ; i < num_operator; i++) { 
int offset = i * size / num_operator;
ls_operator[i].async_operation(&h_c[offset],
&h_a[offset], &h_b[offset],
&d_c[offset],
&d_a[offset], &d_b[offset],
size / num_operator,
bufsize / num_operator);
}

To have the proper output, let's synchronize the host and GPU using the cudaDeviceSynchronize() function. And, finally, we can terminate the CUDA streams. The streams with priorities can be terminated with the cudaStreamDestroy() function, so we have nothing to do in this application as we already did what was needed.

Now, let's compile the code and see the effect. As always, you need to provide the right GPU compute capability version to the compiler:

$ nvcc -m64 -run -gencode arch=compute_70,code=sm_70 -I/usr/local/cuda/samples/common/inc -o prioritized_cuda_stream ./prioritized_cuda_stream.cu

And, the following shows the output of the application:

Priority Range: low(0), high(-1)
stream 0 - elapsed 11.119 ms
stream 3 - elapsed 19.126 ms
stream 1 - elapsed 23.327 ms
stream 2 - elapsed 29.422 ms
compared a sample result...
host: 1.523750, device: 1.523750
Time= 29.730 msec, bandwidth= 27.087332 GB/s

From the output, you can see that the operation order has been changed. Stream 3 precedes stream 1 and stream 2. The following screenshot shows the profile result of how it changed:

In this screenshot, there was preemption with the second CUDA stream (Stream 19 in this case) by the prioritized-last CUDA stream (Stream 21), so that Stream 19 could finish its work after Stream 21 finished execution. Note that the order of data transfer does not change according to this prioritization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.136.170