How to do it...

In the following test, the calculation time of a mathematical operation, as the sum of two vectors with floating-point elements, will be evaluated and compared. To make the comparison, the same operation will be performed on two separate functions.

The first function is computed by the CPU only, while the second function is written by using the PyOpenCL library to use the GPU card. The test is performed on vectors with a size of 10,000 elements.

Here is the code:

Import the relevant libraries. Note the import of the time library to calculate the computation times, and the linalg library, which is a tool of linear algebra tools of the numpy library:

from time import time 
import pyopencl as cl   
import numpy as np    
import deviceInfoPyopencl as device_info 
import numpy.linalg as la

Then, we define the input vectors. They both contain 10000 random elements of floating-point numbers:

a = np.random.rand(10000).astype(np.float32) 
b = np.random.rand(10000).astype(np.float32)

The following function computes the sum of the two vectors working on the CPU (host):

def test_cpu_vector_sum(a, b): 
    c_cpu = np.empty_like(a) 
    cpu_start_time = time() 
    for i in range(10000): 
            for j in range(10000): 
                    c_cpu[i] = a[i] + b[i] 
    cpu_end_time = time() 
    print("CPU Time: {0} s".format(cpu_end_time - cpu_start_time)) 
    return c_cpu

The following function computes the sum of the two vectors working on the GPU (device):

def test_gpu_vector_sum(a, b): 
    platform = cl.get_platforms()[0] 
    device = platform.get_devices()[0] 
    context = cl.Context([device]) 
    queue = cl.CommandQueue(context,properties=
                           cl.command_queue_properties.PROFILING_ENABLE)

Within the test_gpu_vector_sum function, we prepare the memory buffers to contain the input vectors and the output vector:

    a_buffer = cl.Buffer(context,cl.mem_flags.READ_ONLY  
                | cl.mem_flags.COPY_HOST_PTR, hostbuf=a) 
    b_buffer = cl.Buffer(context,cl.mem_flags.READ_ONLY  
                | cl.mem_flags.COPY_HOST_PTR, hostbuf=b) 
    c_buffer = cl.Buffer(context,cl.mem_flags.WRITE_ONLY, b.nbytes)

Still, within the test_gpu_vector_sum function, we define the kernel that will computerize the sum of the two vectors on the device:

    program = cl.Program(context, """ 
    __kernel void sum(__global const float *a, 
                      __global const float *b, 
                      __global float *c){ 
        int i = get_global_id(0); 
        int j; 
        for(j = 0; j < 10000; j++){ 
            c[i] = a[i] + b[i];} 
    }""").build()

Then, we reset the gpu_start_time variable before starting the calculation. After this, we calculate the sum of two vectors and then we evaluate the calculation time:

    gpu_start_time = time() 
    event = program.sum(queue, a.shape, None,a_buffer, b_buffer, 
            c_buffer) 
    event.wait() 
    elapsed = 1e-9*(event.profile.end - event.profile.start) 
    print("GPU Kernel evaluation Time: {0} s".format(elapsed)) 
    c_gpu = np.empty_like(a) 
    cl._enqueue_read_buffer(queue, c_buffer, c_gpu).wait() 
    gpu_end_time = time() 
    print("GPU Time: {0} s".format(gpu_end_time - gpu_start_time)) 
    return c_gpu

Finally, we perform the test, recalling the two functions defined previously:

if __name__ == "__main__": 
    device_info.print_device_info() 
    cpu_result = test_cpu_vector_sum(a, b) 
    gpu_result = test_gpu_vector_sum(a, b) 
    assert (la.norm(cpu_result - gpu_result)) < 1e-5

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...