Chapter 20. Using OpenCL with PyOpenCL

While the focus of this book has been on using OpenCL from C and C++, bindings for other languages such as Python, Ruby, and .NET have been developed. This chapter introduces you to using OpenCL in Python by porting the ImageFilter2D example from Chapter 8 to Python. The purpose of this chapter is to introduce you to the basic differences between OpenCL and Python and to talk about some of Python’s advantages.

This chapter assumes that you have a working knowledge of programming in Python and are able to set up a Python development environment. If you are not familiar with the language, teaching you Python is beyond the scope of this book. However, there are many terrific resources available to learn the language. One highly recommended resource is A Quick, Painless Tutorial on the Python Language by Norman Matloff of the University of California–Davis (available at http://heather.cs.ucdavis.edu/~matloff/Python/PythonIntro.pdf). This is an incredibly succinct and easy-to-understand tutorial for learning the language quickly.

Introducing PyOpenCL

PyOpenCL is an open-source (MIT-licensed) project that provides bindings between OpenCL and Python. There are a number of features of PyOpenCL that make it an attractive library for those looking to work with Python. PyOpenCL provides complete access to the OpenCL API. It takes great advantage of language features of Python (such as dynamic typing) to provide easier-to-use access to OpenCL APIs. Cleanup of objects and error checking are done automatically for you, which means you can write much less code to interact with OpenCL. Further, because the bindings to OpenCL are natively implemented in C++, there is a relatively low overhead for using it.

As of this writing, PyOpenCL is at version 2011.1 and is considered by the authors to be “stable.” That is, code written in PyOpenCL now should continue to function with future versions. The API will provide deprecation warnings as it evolves, but it is now at a stable enough state to use in development. The latest version and installation instructions for PyOpenCL can be found at the project’s Web site at http://documen.tician.de/pyopencl/index.html.

Running the PyImageFilter2D Example

The source code for the Python port of the ImageFilter2D example from Chapter 8 is provided in the Chapter_20/PyImageFilter2D directory. The example was developed in Python v2.6 and requires an installation of the following Python packages:

Numpy: PyOpenCL makes use of numpy for data structures such as arrays and numeric types and is a foundation of interacting with PyOpenCL. Installation information can be found at http://numpy.scipy.org/.

pyopencl v0.92+: PyOpenCL needs to be built against the OpenCL implementation available on your platform, which is done during the installation process. Installation instructions are at http://wiki.tiker.net/PyOpenCL.

PIL (Python Image Library): The Python Image Library provides a number of functions for loading and storing images. Installation instructions are at www.pythonware.com/products/pil/.

Once the necessary dependencies are installed, the example can be run as follows:

$ python ImageFilter2D.py input.jpg output.jpg

This example loads the input image from a file, executes a Gaussian filter kernel using OpenCL, and then outputs the resultant image to the output file. Any image formats that are supported by the PIL can be filtered using the program.

PyImageFilter2D Code

The PyImageFilter2D example was coded by taking the C source from the ImageFilter2D example in Chapter 8 and porting it to Python. The original example was written in C and was 375 lines long (exluding comments), whereas the Python version has only 129 lines. A lot of this has to do with the fact that PyOpenCL wraps error checking and uses dynamic typing for various conveniences. The full listing of the PyImage-Filter2D example is provided in Listing 20.1. The remainder of this chapter will walk through the stages of the program and discuss the changes that were required to move from C to PyOpenCL.

Listing 20.1 ImageFilter2D.py


import pyopencl as cl
import sys
import Image # Python Image Library (PIL)
import numpy

def CreateContext():
    platforms = cl.get_platforms();
    if len(platforms) == 0:
        print "Failed to find any OpenCL platforms."
        return None

    devices = platforms[0].get_devices(cl.device_type.GPU)
    if len(devices) == 0:
        print "Could not find GPU device, trying CPU..."
        devices = platforms[0].get_devices(cl.device_type.CPU)
        if len(devices) == 0:
            print "Could not find OpenCL GPU or CPU device."
            return None

    # Create a context using the first device
    context = cl.Context([devices[0]])
    return context, devices[0]

def CreateProgram(context, device, fileName):
    kernelFile = open(fileName, 'r')
    kernelStr = kernelFile.read()

    # Load the program source
    program = cl.Program(context, kernelStr)

    # Build the program and check for errors
    program.build(devices=[device])

    return program



def LoadImage(context, fileName):
    im = Image.open(fileName)
    # Make sure the image is RGBA formatted
    if im.mode != "RGBA":
        im = im.convert("RGBA")


    # Convert to uint8 buffer
    buffer = im.tostring()
    clImageFormat = cl.ImageFormat(cl.channel_order.RGBA,
                                   cl.channel_type.UNORM_INT8)

    clImage = cl.Image(context,
                       cl.mem_flags.READ_ONLY |
                       cl.mem_flags.COPY_HOST_PTR,
                       clImageFormat,
                       im.size,
                       None,
                       buffer
                       )

    return clImage, im.size

def SaveImage(fileName, buffer, imgSize):
    im = Image.fromstring("RGBA", imgSize, buffer.tostring())
    im.save(fileName)

def RoundUp(groupSize, globalSize):
    r = globalSize % groupSize;
    if r == 0:
        return globalSize;
    else:
        return globalSize + groupSize - r;


def main():

    imageObjects = [ 0, 0 ]

    # Main
    if len(sys.argv) != 3:
        print "USAGE: " + sys.argv[0] + " <inputImageFile>
               <outputImageFile>"
        return 1


    # Create an OpenCL context on first available platform
    context, device = CreateContext();
    if context == None:
        print "Failed to create OpenCL context."
        return 1

    # Create a command-queue on the first device available
    commandQueue = cl.CommandQueue(context, device)

    # Make sure the device supports images, otherwise exit
    if not device.get_info(cl.device_info.IMAGE_SUPPORT):
        print "OpenCL device does not support images."
        return 1

    # Load input image from file and load it into
    # an OpenCL image object
    imageObjects[0], imgSize = LoadImage(context, sys.argv[1])

    # Create ouput image object
    clImageFormat = cl.ImageFormat(cl.channel_order.RGBA,
                                   cl.channel_type.UNORM_INT8)
    imageObjects[1] = cl.Image(context,
                               cl.mem_flags.WRITE_ONLY,
                               clImageFormat,
                               imgSize)

    # Create sampler for sampling image object
    sampler = cl.Sampler(context,
                         False, #  Non-normalized coordinates
                         cl.addressing_mode.CLAMP_TO_EDGE,
                         cl.filter_mode.NEAREST)

    # Create OpenCL program
    program = CreateProgram(context, device, "ImageFilter2D.cl")

    # Call the kernel directly
    localWorkSize = ( 16, 16 )
    globalWorkSize = ( RoundUp(localWorkSize[0], imgSize[0]),
                       RoundUp(localWorkSize[1], imgSize[1]) )

    program.gaussian_filter(commandQueue,
                            globalWorkSize,
                            localWorkSize,
                            imageObjects[0],
                            imageObjects[1],
                            sampler,
                            numpy.int32(imgSize[0]),
                            numpy.int32(imgSize[1]))

    # Read the output buffer back to the Host
    buffer = numpy.zeros(imgSize[0] * imgSize[1] * 4, numpy.uint8)
    origin = ( 0, 0, 0 )
    region = ( imgSize[0], imgSize[1], 1 )

    cl.enqueue_read_image(commandQueue, imageObjects[1],
                          origin, region, buffer).wait()

    print "Executed program successfully."

    # Save the image to disk
    SaveImage(sys.argv[2], buffer, imgSize)

main()


Context and Command-Queue Creation

Several of the demo programs that come with PyOpenCL use a convenience function pyopencl.create_some_context() to create a context. By default, if not passed an argument, this function can provide an interactive prompt for choosing an OpenCL device on which to create the context. For example, on a Linux machine with a dual-GPU NVIDIA GTX 295, this function produces a prompt such as the following:

Choose device(s):
[0] <pyopencl.Device 'GeForce GTX 295' at 0xab7a90>
[1] <pyopencl.Device 'GeForce GTX 295' at 0xcb6630>
Choice, comma-separated [0]:

If running in non-interactive mode (or if an argument of false is passed to the function), a context will be created in an implementation-defined manner. While this convenience function is an appropriate way to create a context for many programs, in our example we create the context in a more traditional way, as shown in Listing 20.2.

Listing 20.2 Creating a Context


def CreateContext():
    platforms = cl.get_platforms();
    if len(platforms) == 0:
        print "Failed to find any OpenCL platforms."
        return None

    devices = platforms[0].get_devices(cl.device_type.GPU)
    if len(devices) == 0:
        print "Could not find GPU device, trying CPU..."
        devices = platforms[0].get_devices(cl.device_type.CPU)
        if len(devices) == 0:
            print "Could not find OpenCL GPU or CPU device."
            return None

    # Create a context using the first device
    context = cl.Context([devices[0]])
    return context, devices[0]


The call to cl.get_platforms() returns a list of Python objects of type pyopencl.Platform. This object contains methods for querying information about the platform as well as retrieving a list of all of the devices available on the platform. The code in Listing 20.2 simply uses the first platform available and then queries to see if any GPU devices are available on the platform by calling platforms[0].get_devices(cl.device_type.GPU). If no GPU devices are found, then the code goes on to check whether any CPU devices are available. Finally, the first device found is used to create a context by calling cl.Context([devices[0]]). This function returns a new pyopencl.Context object from the list of devices passed into it. In this case, our list is only a single device, but in general it is possible to create the context from a list of devices.

Once the context is created and the list of devices has been retrieved, creating the command-queue in PyOpenCL is trivial:

commandQueue = cl.CommandQueue(context, device)

This function creates a command-queue for the context and device that are passed in. Like other PyOpenCL calls, the command-queue is returned as a new object (pyopencl.CommandQueue). The object contains methods such as get_info() and set_property(), which provide wrappers to the low-level OpenCL API functions. In general, this is the pattern that OpenCL uses. Each C-typed OpenCL API object (e.g., cl_context) is wrapped in a Python class that provides Python methods that interface to the OpenCL API calls that are relevant to that object.

Loading to an Image Object

The next step the program takes is to load the input image from disk and load its contents into an OpenCL image object. The Python Image Library (PIL) is used to load the image from disk and convert it to an RGBA-formatted image, as shown in Listing 20.3. Once converted to RGBA, the image is converted to a Python string using the Image.tostring() method. This buffer is then loaded to a pyopencl.Image object.

Listing 20.3 Loading an Image


def LoadImage(context, fileName):
    im = Image.open(fileName)
    # Make sure the image is RGBA formatted
    if im.mode != "RGBA":
        im = im.convert("RGBA")


    # Convert to uint8 buffer
    buffer = im.tostring()
    clImageFormat = cl.ImageFormat(cl.channel_order.RGBA,
                                   cl.channel_type.UNORM_INT8)

    clImage = cl.Image(context,
                       cl.mem_flags.READ_ONLY |
                       cl.mem_flags.COPY_HOST_PTR,
                       clImageFormat,
                       im.size,
                       None,
                       buffer
                       )

    return clImage, im.size


The image format object is created using the pyopencl.ImageFormat constructor, which takes as arguments the channel order and channel type. The image format is defined by specifying the channel order with pyopencl.channel_order and the channel type with pyopencl.channel_type. This is another design that is used throughout PyOpenCL: rather than having one large namespace of enumerants starting with CL_, each is categorized for the objects to which it is relevant.

The creation of the image object is very similar to the OpenCL C API with one large difference: there is not a separate API call for each image dimensionality (e.g., clCreateImage2D, clCreateImage3D). Rather, the dimensions of the image are passed in as a tuple and the implementation will choose to create the correct OpenCL image object.

Creating and Building a Program

Creating and building a program are quite easy in PyOpenCL. The simple code for loading an OpenCL kernel from a file and building it for a list of devices is shown in Listing 20.4.

Listing 20.4 Creating and Building a Program


def CreateProgram(context, device, fileName):
    kernelFile = open(fileName, 'r')
    kernelStr = kernelFile.read()

    # Load the program source
    program = cl.Program(context, kernelStr)

    # Build the program and check for errors
    program.build(devices=[device])

    return program


The source code to the kernel is read into a string buffer and then the program is created using cl.Program(context, kernelStr). Building the program for the device is done by calling program.build(devices=[device]). Whereas normally one has to write code to check whether compile errors occurred and if so grab the info log, this is not necessary in PyOpenCL. If a compile error occurs, PyOpenCL will throw an exception containing the result of the build log.

Setting Kernel Arguments and Executing a Kernel

Perhaps the best advantage of PyOpenCL dynamic typing is in how kernel arguments are set and the kernel is executed. The kernels defined in the OpenCL program actually become methods of the program object that was created. For example, in the ImageFilter2D.cl the gaussian_filter() is declared with the following function signature:

__kernel void gaussian_filter(__read_only image2d_t srcImg,
                              __write_only image2d_t dstImg,
                              sampler_t sampler,
                              int width, int height)

Once the program is built, this kernel actually dynamically becomes a method of the pyopencl.Program that was created. Rather than having to manually set kernel arguments by index using clSetKernelArg() and executing the program using clEnqueueNDRangeKernel(), the method can be invoked directly as if it were a function, as shown in Listing 20.5.

Listing 20.5 Executing the Kernel


# Call the kernel directly
localWorkSize = ( 16, 16 )
globalWorkSize = ( RoundUp(localWorkSize[0], imgSize[0]),
                   RoundUp(localWorkSize[1], imgSize[1]) )

program.gaussian_filter(commandQueue,
                        globalWorkSize,
                        localWorkSize,
                        imageObjects[0],
                        imageObjects[1],
                        sampler,
                        numpy.int32(imgSize[0]),
                        numpy.int32(imgSize[1]))


The localWorkSize and globalWorkSize are computed and stored in tuples. The execution of the gaussian_filter() method will, underneath the hood, set the kernel arguments and queue the kernel for execution. It is also possible to provide events to wait for as a last argument to the function (although this was not done in this example). It is easy to see how this convenience not only makes the code more readable, but also makes executing kernels significantly simpler than using the low-level API.

Reading the Results

Finally, after executing the kernel, the program reads back the results of the filtered image object into a host memory buffer to write it to a file. The code for this is shown in Listing 20.6.

Listing 20.6 Reading the Image into a Numpy Array


# Read the output buffer back to the Host
buffer = numpy.zeros(imgSize[0] * imgSize[1] * 4, numpy.uint8)
origin = ( 0, 0, 0 )
region = ( imgSize[0], imgSize[1], 1 )

cl.enqueue_read_image(commandQueue, imageObjects[1],
                      origin, region, buffer).wait()


A numpy array is initialized to the appropriate size with type uint8 in which to store the results. The image is read back to the host by calling pyopencl.enqueue_read_image(). This function returns a pyopencl.Event object. In order to ensure that the buffer is read before moving in, the code explicitly calls the wait() method on the resultant event object. Finally, this host buffer is saved to an image file using the PIL in the SaveImage() function from Listing 20.1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.47.25