Chapter 13. OpenCL Embedded Profile

The OpenCL specification defines two profiles: a profile for desktop devices (the full profile) and a profile for hand-held and embedded devices (the embedded profile). Hand-held and embedded devices have significant area and power constraints that require a relaxation in the requirements defined by the full profile. The embedded profile targets a strict subset of the OpenCL 1.1 specification required for the full profile. An embedded profile that is a strict subset of the full profile has the following benefits:

• It provides a single specification for both profiles as opposed to having separate specifications.

• OpenCL programs written for the embedded profile should also run on devices that implement the full profile.

• It allows the OpenCL working group to consider requirements of both desktop and hand-held devices in defining requirements for future revisions of OpenCL.

In this chapter, we describe the embedded profile. We discuss core features that are optional for the embedded profile and the relaxation in device and floating-point precision requirements.

OpenCL Profile Overview

The profile is associated with the platform and a device(s). The platform implements the OpenCL platform and runtime APIs (described in Chapters 4 and 5 of the OpenCL 1.1 specification). The platform supports one or more devices, and each device supports a specific profile. Listing 13.1 describes how to query the profiles supported by the platform and each device supported by that platform.

Listing 13.1 Querying Platform and Device Profiles


void
query_profile(cl_platform_id platform)
{
    char          platform_profile[100];
    char          device_profile[100];
    int           num_devices;
    cl_device_id *devices;
    int           i;

    // query the platform profile.
    clGetPlatformInfo(platform,
                      CL_PLATFORM_PROFILE,
                      sizeof(platform_profile),
                      platform_profile,
                      NULL);
    printf("Platform profile is %s ", platform_profile);

    // get all devices supported by platform.
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,
                             0, NULL, &num_devices);
    devices = malloc(num_devices * sizeof(cl_device_id);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,
                        num_devices * sizeof(cl_device_id),
                        devices, NULL);

    // query device profile for each device supported by platform.
    for (i=0; i<num_devices; i++)
    {
        clGetDeviceInfo(devices[i],
                        CL_DEVICE_PROFILE,
                        sizeof(device_profile),
                        device_profile,
                        NULL);

        printf("Device profile for device index %d is %s ",
                                          i, device_profile);
    }

    free(devices);
}


The clGetPlatformInfo and clGetDeviceInfo APIs are described in detail in Chapter 3.

The embedded profile is a strict subset of the full profile. The embedded profile has several restrictions not present in the full profile. These restrictions are discussed throughout the rest of this chapter.

64-Bit Integers

In the embedded profile 64-bit integers are optional. This means that the long, ulong scalar and longn, ulongn vector data types in an OpenCL program may not be supported by a device that implements the embedded profile. If an embedded profile implementation supports 64-bit integers, then the cles_khr_int64 extension string will be in the list of extension strings supported by the device. If this extension string is not in the list of extension strings supported by the device, using 64-bit integer data types in an OpenCL C program will result in a build failure when building the program executable for that device.

The following code shows how to query whether a device supports the cles_khr_int64 extension string. Note that this extension string is not reported by devices that implement the full profile.

bool
query_extension(const char *extension_name, cl_device_id device)
{
    size_t    size;
    char      *extensions;
    char      delims[] = " "; // space-separated list of names
    char      *result = NULL;
    cl_int    err;
    bool      extension_found;

    err = clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS,
                                         0, NULL, &size);
    if (err)
        return false;

    extensions = malloc(size);
    clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS,
                            size, extensions, NULL);

    extension_found = false;
    result = strtok( extensions, delims );
    while (result != NULL)
    {
        // extension_name is "cles_khr_int64"
        if (strcmp(result, extension_name) == 0)
        {
            extension_found = true;
            break;
        }
        result = strtok(NULL, delims);
    }

    free(extensions);
    return extension_found;
}

Images

Image support is optional for both profiles. To find out if a device supports images, query the CL_DEVICE_IMAGE_SUPPORT property using the clGetDeviceInfo API. If the embedded profile device supports images, then the following additional restrictions apply:

• Support for 3D images is optional. For a full profile device that supports images, reading from a 3D image in an OpenCL C program is required but writing to a 3D image in an OpenCL C program is optional. An embedded profile device may not support 3D images at all (reads and writes). To find out if the device supports 3D images (i.e., reading a 3D image in an OpenCL C program), query the CL_DEVICE_IMAGE3D_MAX_WIDTH property using the clGetDeviceInfo API. This will have a value of zero if the device does not support 3D images and a non-zero value otherwise.

OpenCL C programs that use the image3d_t type will fail to build the program executable for an embedded profile device that does not support 3D images.

• Bilinear filtering for half-float and float images is not supported. Any 2D and 3D images with an image channel data type of CL_HALF_FLOAT or CL_FLOAT must use a sampler of CL_FILTER_NEAREST. Otherwise the results returned by read_imagef and read_imageh are undefined.

• Precision of conversion rules when converting a normalized integer channel data type value to a single-precision floating-point value is different for the embedded and full profiles. The precision of conversions from CL_UNORM_INT8, CL_UNORM_INT16, CL_UNORM_INT_101010, CL_SNORM_INT8, and CL_SNORM_INT16 to float is <= 1.5 ulp for the full profile and <= 2.0 ulp for the embedded profile. Conversion of specific values, such as 00.0f, 2551.0f, -127 and -128-1.0f, 1271.0f are guaranteed to be the same for both profiles.

The required list of image formats (for reading and writing) that must be supported by an embedded profile device is given in Table 13.1.

Table 13.1 Required Image Formats for Embedded Profile

image

Built-In Atomic Functions

The full profile supports built-in functions that perform atomic operations on 32-bit integers to global and local memory. These built-in functions are optional for the embedded profile. Check for the cl_khr_global_int32_base_atomics, cl_khr_global_int32_extended_atomics, cl_khr_local_int32_base_atomics, and cl_khr_local_int32_extended_atomics extensions in the list of extension strings reported by a device to see which functions, if any, are supported by the embedded profile device.

Mandated Minimum Single-Precision Floating-Point Capabilities

The mandated minimum single-precision floating-point capability for the full profile is CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN. For the embedded profile, the mandated minimum capability is CL_FP_ROUND_TO_NEAREST or CL_FP_ROUND_TO_ZERO. Support for positive or negative infinity and NaN is not required.

If CL_FP_NAN is not set, and one of the operands or the correctly rounded result of addition, subtraction, multiplication, or division is INF or NaN, the value of the result is implementation-defined. Likewise, single-precision comparison operators (<, >, <=, >=, ==, !=) return implementation-defined values when one or more operands is a NaN.

Conversions between different types (implicit and explicit) for the embedded profile are correctly rounded as described for the full profile, including those that consume or produce an INF or NaN.

Denormalized numbers for the half data type, which may be generated when converting a float to a half (for example, using vstore_half), or when converting from a half to a float (for example, using vload_half), may be flushed to zero by an embedded profile device. A full profile device, however, cannot flush these denorm values to zero.

The built-in math functions behave as described for the full profile, including edge case behavior (described in Section 7.5.1 of the OpenCL 1.1 specification). Table 13.2 describes the built-in math functions that differ in the minimum required accuracy between the full and embedded profiles.

Table 13.2 Accuracy of Math Functions for Embedded Profile versus Full Profile

image

This relaxation of the requirement to adhere to IEEE 754 requirements for basic floating-point operations, though extremely undesirable, is to provide flexibility for embedded and hand-held devices that have much stricter requirements on hardware area budgets.

Table 13.3 describes the differences in the mandated minimum maximum values for device properties (described in Table 4.3 of the OpenCL 1.1 specification).

Table 13.3 Device Properties: Minimum Maximum Values for Full Profile versus Embedded Profile

image

The minimum maximum values for device properties related to images described in Table 13.3 apply only if the device supports images.

Determining the Profile Supported by a Device in an OpenCL C Program

The embedded profile is a strict subset of the full profile. An OpenCL C program written for the embedded profile will work on any device that supports the full profile. There may be cases where the application may want to have separate code paths depending on which profile is supported by the device executing a kernel(s).

The __EMBEDDED_PROFILE__ macro is added to the OpenCL C language to determine whether a kernel is executing on an embedded profile or a full profile device. It is the integer constant 1 for devices that implement the embedded profile and is undefined otherwise.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.91.24