Events

Events are objects that exist on the GPU, whose purpose is to act as milestones or progress markers for a stream of operations. Events are generally used to provide measure time duration on the device side to precisely time operations; the measurements we have been doing so far have been with host-based Python profilers and standard Python library functions such as time. Additionally, events they can also be used to provide a status update for the host as to the state of a stream and what operations it has already completed, as well as for explicit stream-based synchronization.

Let's start with an example that uses no explicit streams and uses events to measure only one single kernel launch. (If we don't explicitly use streams in our code, CUDA actually invisibly defines a default stream that all operations will be placed into).

Here, we will use the same useless multiply/divide loop kernel and header as we did at the beginning of the chapter, and modify most of the following contents. We want a single kernel instance to run a long time for this example, so we will generate a huge array of random numbers for the kernel to process, as follows:

array_len = 100*1024**2
data = np.random.randn(array_len).astype('float32')
data_gpu = gpuarray.to_gpu(data)

We now construct our events using the pycuda.driver.Event constructor (where, of course, pycuda.driver has been aliased as drv by our prior import statement).

We will create two event objects here, one for the start of the kernel launch, and the other for the end of the kernel launch, (We will always need two event objects to measure any single GPU operation, as we will see soon):

start_event = drv.Event()
end_event = drv.Event()

Now, we are about ready to launch our kernel, but first, we have to mark the start_event instance's place in the stream of execution with the event record function. We launch the kernel and then mark the place of end_event in the stream of execution, and also with record:

start_event.record()
mult_ker(data_gpu, np.int32(array_len), block=(64,1,1), grid=(1,1,1))
end_event.record()

Events have a binary value that indicates whether they were reached or not yet, which is given by the function query. Let's print a status update for both events, immediately after the kernel launch:

print 'Has the kernel started yet? {}'.format(start_event.query())
 print 'Has the kernel ended yet? {}'.format(end_event.query())

Let's run this right now and see what happens:

Our goal here is to ultimately measure the time duration of our kernel execution, but the kernel hasn't even apparently launched yet. Kernels in PyCUDA have launched asynchronously (whether they exist in a specific stream or not), so we have to have to ensure that our host code is properly synchronized with the GPU.

Since end_event comes last, we can block further host code execution until the kernel completes by this event object's synchronize function; this will ensure that the kernel has completed before any further lines of host code are executed. Let's add a line a line of code to do this in the appropriate place:

end_event.synchronize()
 
print 'Has the kernel started yet?  {}'.format(start_event.query())

print 'Has the kernel ended yet? {}'.format(end_event.query())

Finally, we are ready to measure the execution time of the kernel; we do this with the event object's time_till or time_since operations to compare to another event object to get the time between these two events in milliseconds. Let's use the time_till operation of start_event on end_event:

print 'Kernel execution time in milliseconds: %f ' % start_event.time_till(end_event)

Time duration can be measured between two events that have already occurred on the GPU with the time_till and time_since functions. Note that these functions always return a value in terms of milliseconds!

Let's try running our program again now:

(This example is also available in the simple_event_example.py file in the repository.)

Table of Contents for Events

Create new playlist

Sign In

Sign Up

Table of Contents for
Events