Chapter 11. Interoperability with Direct3D

Similarly to the discussion of sharing functions in the previous chapter, this chapter explores how to achieve interoperation between OpenCL and Direct3D 10 (known as D3D interop). D3D interop is a powerful feature that allows programs to share data between Direct3D and OpenCL. Some possible applications for D3D interop include the ability to render in D3D and postprocess with OpenCL, or to use OpenCL to compute effects for display in D3D. This chapter covers the following concepts:

• Querying the Direct3D platform for sharing capabilities

• Creating buffers from D3D memory

• Creating contexts, associating devices, and the corresponding synchronization and memory management defined by this implied environment

Direct3D/OpenCL Sharing Overview

At a high level, Direct3D interoperability operates similarly to OpenGL interop as described in the previous chapter. Buffers and textures that are allocated in a Direct3D context can be accessed in OpenCL by a few special OpenCL calls implemented in the Direct3D/OpenGL Sharing API. When D3D sharing is present, applications can use D3D buffer, texture, and renderbuffer objects as OpenCL memory objects.


Note

This chapter assumes a familiarity with setup and initialization of a Direct3D application as well as basic Direct3D graphics programming. This chapter will instead focus on how D3D and OpenCL interoperate.


When using Direct3D interop, the program must first initialize the Direct3D environment using the Direct3D API. The program should create a window, find an appropriate D3D10 adapter, and get a handle to an appropriate D3D10 device and swap chain. These are handled by their respective Direct3D calls. The CreateDXGIFactory() call allows you to create a factory object that will enumerate the adapters on the system by way of the EnumAdapters() function. For a capable adapter, the adapter handle is then used to get a device and swap chain handle with the D3D10CreateDeviceAndSwapChain() call. This call returns an ID3D10Device handle, which is then used in subsequent calls to interop with OpenCL. At this point the program has created working Direct3D handles, which are then used by OpenCL to facilitate sharing.

Initializing an OpenCL Context for Direct3D Interoperability

OpenCL sharing is enabled by the pragma cl_khr_d3d10_sharing:

#pragma OPENCL EXTENSION cl_khr_d3d10_sharing : enable

When D3D sharing is enabled, a number of the OpenCL functions are extended to accept parameter types and values that deal with D3D10 sharing.

D3D interop properties can be used to create OpenCL contexts:

CL_CONTEXT_D3D10_DEVICE_KHR is accepted as a property name in the properties parameter of clCreateContext and clCreateContextFromType.

Functions may query D3D-interop-specific object parameters:

CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR is accepted as a value in the param_name parameter of clGetContextInfo.

CL_MEM_D3D10_RESOURCE_KHR is accepted as a value in the param_name parameter of clGetMemObjectInf.

CL_IMAGE_D3D10_SUBRESOURCE_KHR is accepted as a value in the param_name parameter of clGetImageInfo.

CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR and CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR are returned in the param_value parameter of clGetEventInfo when param_name is CL_EVENT_COMMAND_TYPE.

Functions that use D3D interop may return interop-specific error codes:

CL_INVALID_D3D10_DEVICE_KHR is returned by clCreateContext and clCreateContextFromType if the Direct3D 10 device specified for interoperability is not compatible with the devices against which the context is to be created.

CL_INVALID_D3D10_RESOURCE_KHR is returned by clCreateFromD3D10BufferKHR when the resource is not a Direct3D 10 buffer object, and by clCreateFromD3D10Texture2DKHR and clCreateFromD3D10Texture3DKHR when the resource is not a Direct3D 10 texture object.

CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR is returned by clEnqueueAcquireD3D10ObjectsKHR when any of the mem_objects are currently acquired by OpenCL.

CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR is returned by clEnqueueReleaseD3D10ObjectsKHR when any of the mem_objects are not currently acquired by OpenCL.

OpenCL D3D10 interop functions are available from the header cl_d3d10.h. Note that the Khronos extensions for D3D10 are available on the Khronos Web site. On some distributions you may need to download this file. The sample code included on the book’s Web site for this chapter assumes that this is found in the OpenCL include path. Additionally, as shown in the code, the extension functions may need to be initialized using the clGetExtensionFunctionAddress() call.

The ID3D10Device handle returned by D3D10CreateDeviceAndSwapChain() can be used to get an OpenCL device ID, which can later be used to create an OpenCL context.

Initializing OpenCL proceeds as usual with a few differences. The platforms can first be enumerated using the clGetPlatformIDs function. Because we are searching for a platform that supports Direct3D sharing, the clGetPlatformInfo() call is used on each of the platforms to query the extensions it supports. If cl_khr_d3d_sharing is present in the extensions string, then that platform can be selected for D3D sharing.

Given a cl_platform_id that supports D3D sharing, we can query for corresponding OpenCL device IDs on that platform using clGetDeviceIDsFromD3D10KHR ():

The OpenCL devices corresponding to a Direct3D 10 device and the OpenCL devices corresponding to a DXGI adapter may be queried. The OpenCL devices corresponding to a Direct3D 10 device will be a subset of the OpenCL devices corresponding to the DXGI adapter against which the Direct3D 10 device was created.

For example, the following code gets an OpenCL device ID (cdDevice) for the chosen OpenCL platform (cpPlatform). The constant CL_D3D10_DEVICE_KHR indicates that the D3D10 object we are sending (g_pD3DDevice) is a D3D10 device, and we choose the preferred device for that platform with the CL_PREFERRED_DEVICES_FOR_D3D10_KHR constant. This will return the preferred OpenCL device associated with the platform and D3D10 device. The code also checks for the return value and possible errors resulting from the function.

errNum = clGetDeviceIDsFromD3D10KHR(
    cpPlatform,
    CL_D3D10_DEVICE_KHR,
    g_pD3DDevice,
    CL_PREFERRED_DEVICES_FOR_D3D10_KHR,
    1,
    &cdDevice,
    &num_devices);

    if (errNum == CL_INVALID_PLATFORM) {
        printf("Invalid Platform: ",
               "Specified platform is not valid ");
    } else if( errNum == CL_INVALID_VALUE) {
        printf("Invalid Value: ",
               "d3d_device_source, d3d_device_set is not valid ",
               "or num_entries = 0 and devices != NULL ",
               "or num_devices == devices == NULL ");
    } else if( errNum == CL_DEVICE_NOT_FOUND) {
        printf("No OpenCL devices corresponding to the ",
               "d3d_object were found ");
    }

The device ID returned by this function can then be used to create a context that supports D3D sharing. When creating the OpenCL context, the cl_context_properties field in the clCreateContext*() call should include the pointer to the D3D10 device to be shared with. The following code sets up the context properties for D3D sharing and then uses them to create a context:

cl_context_properties contextProperties[] =
{
    CL_CONTEXT_D3D10_DEVICE_KHR,
    (cl_context_properties)g_pD3DDevice,
    CL_CONTEXT_PLATFORM,
    (cl_context_properties)*pFirstPlatformId,
    0
};
context = clCreateContextFromType(contextProperties,
    CL_DEVICE_TYPE_GPU,
    NULL, NULL, &errNum);

In the example code the pointer to the D3D10 device, g_pD3DDevice, is as returned from the D3D10CreateDeviceAndSwapChain() call.

Creating OpenCL Memory Objects from Direct3D Buffers and Textures

OpenCL buffer and image objects can be created from existing D3D buffer objects and textures using the clCreateFromD3D10*KHR() OpenCL functions. This makes D3D objects accessible in OpenCL.

An OpenCL memory object can be created from an existing D3D buffer using the clCreateFromD3D10BufferKHR() function:

The size of the returned OpenCL buffer object is the same as the size of resource. This call will increment the internal Direct3D reference count on resource. The internal Direct3D reference count on resource will be decremented when the OpenCL reference count on the returned OpenCL memory object drops to zero.

Both buffers and textures can be shared with OpenCL. Our first example will begin with processing of a texture in OpenCL for display in D3D10, and we will see an example of processing a buffer of vertex data later in this chapter.

In D3D10, a texture can be created as follows:

int g_WindowWidth = 256;
int g_WindowHeight = 256;
...
ZeroMemory( &desc, sizeof(D3D10_TEXTURE2D_DESC) );
desc.Width = g_WindowWidth;
desc.Height = g_WindowHeight;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.Usage = D3D10_USAGE_DEFAULT;
desc.BindFlags = D3D10_BIND_SHADER_RESOURCE;
if (FAILED(g_pD3DDevice->CreateTexture2D(
    &desc, NULL, &g_pTexture2D)))
return E_FAIL;

The format of the texture data to be shared is specified at this time and is set to DXGI_FORMAT_R8G8B8A8_UNORM in the preceding code. After this texture is created, an OpenCL image object may be created from it using clCreateFromD3D10Texture2DKHR():

The width, height, and depth of the returned OpenCL image object are determined by the width, height, and depth of subresource subresource of resource. The channel type and order of the returned OpenCL image object are determined by the format of resource as shown in Direct3D 10 and corresponding OpenCL image formats for clCreateFromD3D10Texture2DKHR.

This call will increment the internal Direct3D reference count on resource. The internal Direct3D reference count on resource will be decremented when the OpenCL reference count on the returned OpenCL memory object drops to zero.

Now, to create an OpenCL texture object from the newly created D3D texture object, g_pTexture2D, clCreateFromD3D10Texture2DKHR() can be called as follows:

g_clTexture2D = clCreateFromD3D10Texture2DKHR(
    context,
    CL_MEM_READ_WRITE,
    g_pTexture2D,
    0,
    &errNum);

The flags parameter determines the usage information. It accepts the values CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY, or CL_MEM_READ_WRITE. Here the texture has been created to be both readable and writable from a kernel. The OpenCL object g_clTexture2D can now be used by OpenCL kernels to access the D3D texture object. In our simple case, the texture resource has only a single subresource, identified by passing the 0 resource ID parameter.

To create an OpenCL 3D image object from a Direct3D 10 3D texture, use the following call:

The width, height, and depth of the returned OpenCL 3D image object are determined by the width, height, and depth of subresource subresource of resource. The channel type and order of the returned OpenCL 3D image object are determined by the format of resource as shown in Table 11.1.

Table 11.1 Direct3D Texture Format Mappings to OpenCL Image Formats

image

image

This call will increment the internal Direct3D reference count on resource. The internal Direct3D reference count on resource will be decremented when the OpenCL reference count on the returned OpenCL memory object drops to zero.

Note that the OpenCL kernel call to read from or write to an image (read_image*() and write_image*(), respectively) must correspond to the channel type and order of the OpenCL image. The channel type and order of the OpenCL 2D or 3D image object that is being shared is dependent upon the format of the Direct3D 10 resource that is passed into clCreateFromD3D10Texture2DKHR/clCreateFromD3D10Texture3DKHR. Following the previous example, the DXGI_FORMAT_R8G8B8A8_UNORM format creates an OpenCL image with a CL_RGBA image format and a CL_SNORM_INT8 channel data type. The specification contains a list of mappings from DXGI formats to OpenCL image formats (channel order and channel data type), shown in Table 11.1.

Acquiring and Releasing Direct3D Objects in OpenCL

Direct3D objects must be acquired before being processed in OpenCL and released before they are used by Direct3D. D3D10 objects can be acquired and released with the following function:

This acquires OpenCL memory objects that have been created from Direct3D 10 resources.

The Direct3D 10 objects are acquired by the OpenCL context associated with command_queue and can therefore be used by all command-queues associated with the OpenCL context.

OpenCL memory objects created from Direct3D 10 resources must be acquired before they can be used by any OpenCL commands queued to a command-queue. If an OpenCL memory object created from a Direct3D 10 resource is used while it is not currently acquired by OpenCL, the call attempting to use that OpenCL memory object will return CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR.

clEnqueueAcquireD3D10ObjectsKHR() provides the synchronization guarantee that any Direct3D 10 calls made before clEnqueueAcquireD3D10ObjectsKHR() is called will complete executing before event reports completion and before the execution of any subsequent OpenCL work issued in command_queue begins.

The similar release function is

This releases OpenCL memory objects that have been created from Direct3D 10 resources.

The Direct3D 10 objects are released by the OpenCL context associated with command_queue.

OpenCL memory objects created from Direct3D 10 resources that have been acquired by OpenCL must be released by OpenCL before they may be accessed by Direct3D 10. Accessing a Direct3D 10 resource while its corresponding OpenCL memory object is acquired is in error and will result in undefined behavior, including but not limited to possible OpenCL errors, data corruption, and program termination.

clEnqueueReleaseD3D10ObjectsKHR() provides the synchronization guarantee that any calls to Direct3D 10 made after the call to clEnqueueReleaseD3D10ObjectsKHR() will not start executing until after all events in event_wait_list are complete and all work already submitted to command_queue completes execution.

Note that in contrast to the OpenGL acquire function, which does not provide synchronization guarantees, the D3D10 acquire function does. Also, when acquiring and releasing textures, it is most efficient to acquire and release all textures and resources that are being shared at the same time. Additionally, when processing it is best to process all of the OpenCL kernels before switching back to Direct3D processing. By following this, all the acquire and release calls can be used to form the boundary of OpenCL and Direct3D processing.

Processing a Direct3D Texture in OpenCL

So far we have described how to obtain an OpenCL image from a D3D texture. In this section we will discuss how to process the texture’s data in OpenCL and display the result in Direct3D. In the following example code we will use an OpenCL kernel to alter a texture’s contents in each frame. We begin by showing a fragment of code for the rendering loop of a program:

void Render()
{
    // Clear the back buffer
    // to values red, green, blue, alpha
    float ClearColor[4] = { 0.0f, 0.125f, 0.1f, 1.0f };
    g_pD3DDevice->ClearRenderTargetView(
        g_pRenderTargetView, ClearColor);

    computeTexture();
    // Render the quadrilateral
    D3D10_TECHNIQUE_DESC techDesc;
    g_pTechnique->GetDesc( &techDesc );
    for( UINT p = 0; p < techDesc.Passes; ++p )
    {
        g_pTechnique->GetPassByIndex( p )->Apply( 0 );
        g_pD3DDevice->Draw( 4, 0 );
    }


    // Present the information rendered to the
    // back buffer to the front buffer (the screen)
    g_pSwapChain->Present( 0, 0 );
}

The code simply clears the window to a predefined color, then calls OpenCL to update the texture contents in the computeTexture() function. Finally, the texture is displayed on the screen. The computeTexture() function used in the preceding code launches an OpenCL kernel to modify the contents of the texture as shown in the next code fragment. The function acquires the D3D object, launches the kernel to modify the texture, and then releases the D3D object. The g_clTexture2D OpenCL image object that was created from the D3D object is passed to the kernel as a parameter. Additionally, a simple animation is created by the host maintaining a counter, seq, that is incremented each time this function is called and passed as a parameter to the kernel. Here is the full code for the computeTexture() function:

// Use OpenCL to compute the colors on the texture background
cl_int computeTexture()
{
    cl_int errNum;

    static cl_int seq =0;
    seq = (seq+1)%(g_WindowWidth*2);

    errNum = clSetKernelArg(tex_kernel, 0, sizeof(cl_mem),
        &g_clTexture2D);
    errNum = clSetKernelArg(tex_kernel, 1, sizeof(cl_int),
        &g_WindowWidth);
    errNum = clSetKernelArg(tex_kernel, 2, sizeof(cl_int),
        &g_WindowHeight);
    errNum = clSetKernelArg(tex_kernel, 3, sizeof(cl_int),
        &seq);
    size_t tex_globalWorkSize[2] = {
        g_WindowWidth,
        g_WindowHeight };
    size_t tex_localWorkSize[2] = { 32, 4 };

    errNum = clEnqueueAcquireD3D10ObjectsKHR(commandQueue, 1,
        &g_clTexture2D, 0, NULL, NULL );

    errNum = clEnqueueNDRangeKernel(commandQueue, tex_kernel, 2,
        NULL,
        tex_globalWorkSize, tex_localWorkSize,
        0, NULL, NULL);
    if (errNum != CL_SUCCESS)
    {
      std::cerr << "Error queuing kernel for execution." <<
      std::endl;
    }
    errNum = clEnqueueReleaseD3D10ObjectsKHR(commandQueue, 1,
        &g_clTexture2D, 0, NULL, NULL );
    clFinish(commandQueue);
    return 0;
}

As in the previous chapter on OpenGL interop, we will again use an OpenCL kernel to computationally generate the contents of a D3D texture object. The texture was declared with the format DXGI_FORMAT_R8G8B8A8_UNORM, which corresponds to an OpenCL texture with channel order CL_RGBA and channel data CL_UNORM_INT8. This texture can be written to using the write_imagef() function in a kernel:

__kernel void init_texture_kernel(__write_only image2d_t im,
    int w, int h, int seq )
{
    int2 coord = { get_global_id(0), get_global_id(1) };
    float4 color =  {
        (float)coord.x/(float)w,
        (float)coord.y/(float)h,
        (float)abs(seq-w)/(float)w,
        1.0f};
    write_imagef( im, coord, color );
}

Here, seq is a sequence number variable that is circularly incremented in each frame on the host and sent to the kernel. In the kernel, the seq variable is used to generate texture color values. As seq is incremented, the colors change to animate the texture.

In the full source code example included in the book reference material for this chapter, a rendering technique, g_pTechnique, is used. It is a basic processing pipeline, involving a simple vertex shader that passes vertex and texture coordinates to a pixel shader:

//
// Vertex Shader
//
PS_INPUT VS( VS_INPUT input )
{
    PS_INPUT output = (PS_INPUT)0;
    output.Pos = input.Pos;
    output.Tex = input.Tex;

    return output;
}
technique10 Render
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( NULL );
        SetPixelShader( CompileShader( ps_4_0, PS() ) );
    }
}

This technique is loaded using the usual D3D10 calls. The pixel shader then performs the texture lookup on the texture that has been modified by the OpenCL kernel and displays it:

SamplerState samLinear
{
    Filter = MIN_MAG_MIP_LINEAR;
    AddressU = Wrap;
    AddressV = Wrap;
};

float4 PS( PS_INPUT input) : SV_Target
{
    return txDiffuse.Sample( samLinear, input.Tex );
}

In this pixel shader, samLinear is a linear sampler for the input texture. For each iteration of the rendering loop, OpenCL updates the texture contents in computeTexture() and D3D10 displays the updated texture.

Processing D3D Vertex Data in OpenCL

As mentioned previously, buffers can also be shared from Direct3D. We will now consider the case where a D3D buffer holding vertex data is used to draw a sine wave on screen. We can begin by defining a simple structure for the vertex buffer in Direct3D:

struct SimpleSineVertex
{
    D3DXVECTOR4 Pos;
};

A D3D10 buffer can be created for this structure, in this case holding 256 elements:

bd.Usage = D3D10_USAGE_DEFAULT;
bd.ByteWidth = sizeof( SimpleSineVertex ) * 256;
bd.BindFlags = D3D10_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0;
bd.MiscFlags = 0;
hr = g_pD3DDevice->CreateBuffer( &bd, NULL,
    &g_pSineVertexBuffer );

Because we will use OpenCL to set the data in the buffer, we pass NULL as the second parameter, pInitialData, to allocate space only.

Once the D3D buffer g_pSineVertexBuffer is created, an OpenCL buffer, g_clBuffer, can be created from g_pSineVertexBuffer using the clCreateFromD3D10BufferKHR() function:

g_clBuffer = clCreateFromD3D10BufferKHR( context,
    CL_MEM_READ_WRITE, g_pSineVertexBuffer, &errNum );

As in the previous example, g_clBuffer can be sent as a kernel parameter to an OpenCL kernel that generates data. As in the texture example, the D3D object is acquired with clEnqueueAcquireD3D10ObjectsKHR() before the kernel launch and released with clEnqueueReleaseD3D10ObjectsKHR() after the kernel completes. In the sample code, the vertex positions for a sine wave are generated in a kernel:

__kernel void init_vbo_kernel(__global float4 *vbo,
    int w, int h, int seq)
{
    int gid = get_global_id(0);
    float4 linepts;
    float f = 1.0f;
    float a = 0.4f;
    float b = 0.0f;

    linepts.x = gid/(w/2.0f)-1.0f;
    linepts.y = b + a*sin(3.14*2.0*((float)gid/(float)w*f +
         (float)seq/(float)w));
    linepts.z = 0.5f;
    linepts.w = 0.0f;

    vbo[gid] = linepts;
}

Similarly to the texturing example, the variable seq is used as a counter to animate the sine wave on the screen.

When rendering, we set the layout and the buffer and specify a line strip. Then, computeBuffer() calls the preceding kernel to update the buffer. A simple rendering pipeline, set up as pass 1 in the technique, is activated, and the 256 data points are drawn:

// Set the input layout
g_pD3DDevice->IASetInputLayout( g_pSineVertexLayout );
// Set vertex buffer
stride = sizeof( SimpleSineVertex );
offset = 0;
g_pD3DDevice->IASetVertexBuffers( 0, 1, &g_pSineVertexBuffer,
    &stride, &offset );
// Set primitive topology
g_pD3DDevice->IASetPrimitiveTopology(
    D3D10_PRIMITIVE_TOPOLOGY_LINESTRIP );
computeBuffer();
g_pTechnique->GetPassByIndex( 1 )->Apply( 0 );
g_pD3DDevice->Draw( 256, 0 );

When run, the program will apply the kernel to generate the texture contents, then run the D3D pipeline to sample the texture and display it on the screen. The vertex buffer is then also drawn, resulting in a sine wave on screen. The resulting program is shown in Figure 11.1.

Figure 11.1 A program demonstrating OpenCL/D3D interop. The sine positions of the vertices in the sine wave and the texture color values are programmatically set by kernels in OpenCL and displayed using Direct3D.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.131.255