Implementing custom kernel functions

CuPy provides three types of custom kernel function: elementwise, reduction and raw kernels. The elementwise kernel helps with the automatic indexing for each element. Therefore, we can just write an element's operation. The reduction kernel carries out the reduction operation, while also performing the user-defined operation. The raw kernel enables direct CUDA C/C++ kernel programming on Python codes, so that we can define any operation on it. In this section, we will not cover all of them. However, you can learn more from the relevant documentation—https://docs-cupy.chainer.org/en/stable/tutorial/kernel.html.

Let's discuss the user-defined elementwise kernel implementation. Here is an example of elementwise operation:

>>> squared_diff = cp.ElementwiseKernel( 
... 'float32 x, float32 y',
... 'float32 z',
... 'z = (x - y) * (x - y)',
... 'squared_diff')

Then, we can do the elementwise operation without the explicit indexing operation:

>>> x = cp.random.uniform(0, 1, (2, 4)).astype('float32') 
>>> y = cp.random.uniform(0, 1, (2, 4)).astype('float32')
>>> squared_diff(x, y)
array([[0.54103416, 0.01342529, 0.01425287, 0.67101586],
[0.04841561, 0.09939388, 0.46790633, 0.00203693]], dtype=float32)
>>> squared_diff(x, 0.5)
array([[0.23652133, 0.22603741, 0.08065639, 0.00647551],
[0.00029328, 0.07454127, 0.00666 , 0.18399356]], dtype=float32)

As you can see, CuPy provides a highly Pythonic interface and is easy to learn. There are lots of internal routines, which are also compatible with NumPy—https://docs-cupy.chainer.org/en/stable/reference/routines.html. In other words, we can consider using CuPy when we need accelerated computations in NumPy.

Now, we will cover PyCUDA, which provides direct kernel programming and implicit memory management wrappers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.0.192