The OpenCL specification defines two profiles: a profile for desktop devices (the full profile) and a profile for hand-held and embedded devices (the embedded profile). Hand-held and embedded devices have significant area and power constraints that require a relaxation in the requirements defined by the full profile. The embedded profile targets a strict subset of the OpenCL 1.1 specification required for the full profile. An embedded profile that is a strict subset of the full profile has the following benefits:
• It provides a single specification for both profiles as opposed to having separate specifications.
• OpenCL programs written for the embedded profile should also run on devices that implement the full profile.
• It allows the OpenCL working group to consider requirements of both desktop and hand-held devices in defining requirements for future revisions of OpenCL.
In this chapter, we describe the embedded profile. We discuss core features that are optional for the embedded profile and the relaxation in device and floating-point precision requirements.
The profile is associated with the platform and a device(s). The platform implements the OpenCL platform and runtime APIs (described in Chapters 4 and 5 of the OpenCL 1.1 specification). The platform supports one or more devices, and each device supports a specific profile. Listing 13.1 describes how to query the profiles supported by the platform and each device supported by that platform.
void
query_profile(cl_platform_id platform)
{
char platform_profile[100];
char device_profile[100];
int num_devices;
cl_device_id *devices;
int i;
// query the platform profile.
clGetPlatformInfo(platform,
CL_PLATFORM_PROFILE,
sizeof(platform_profile),
platform_profile,
NULL);
printf("Platform profile is %s
", platform_profile);
// get all devices supported by platform.
clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,
0, NULL, &num_devices);
devices = malloc(num_devices * sizeof(cl_device_id);
clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,
num_devices * sizeof(cl_device_id),
devices, NULL);
// query device profile for each device supported by platform.
for (i=0; i<num_devices; i++)
{
clGetDeviceInfo(devices[i],
CL_DEVICE_PROFILE,
sizeof(device_profile),
device_profile,
NULL);
printf("Device profile for device index %d is %s
",
i, device_profile);
}
free(devices);
}
The clGetPlatformInfo
and clGetDeviceInfo
APIs are described in detail in Chapter 3.
The embedded profile is a strict subset of the full profile. The embedded profile has several restrictions not present in the full profile. These restrictions are discussed throughout the rest of this chapter.
In the embedded profile 64-bit integers are optional. This means that the long
, ulong
scalar and long
n
, ulong
n
vector data types in an OpenCL program may not be supported by a device that implements the embedded profile. If an embedded profile implementation supports 64-bit integers, then the cles_khr_int64
extension string will be in the list of extension strings supported by the device. If this extension string is not in the list of extension strings supported by the device, using 64-bit integer data types in an OpenCL C program will result in a build failure when building the program executable for that device.
The following code shows how to query whether a device supports the cles_khr_int64
extension string. Note that this extension string is not reported by devices that implement the full profile.
bool
query_extension(const char *extension_name, cl_device_id device)
{
size_t size;
char *extensions;
char delims[] = " "; // space-separated list of names
char *result = NULL;
cl_int err;
bool extension_found;
err = clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS,
0, NULL, &size);
if (err)
return false;
extensions = malloc(size);
clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS,
size, extensions, NULL);
extension_found = false;
result = strtok( extensions, delims );
while (result != NULL)
{
// extension_name is "cles_khr_int64"
if (strcmp(result, extension_name) == 0)
{
extension_found = true;
break;
}
result = strtok(NULL, delims);
}
free(extensions);
return extension_found;
}
Image support is optional for both profiles. To find out if a device supports images, query the CL_DEVICE_IMAGE_SUPPORT
property using the clGetDeviceInfo
API. If the embedded profile device supports images, then the following additional restrictions apply:
• Support for 3D images is optional. For a full profile device that supports images, reading from a 3D image in an OpenCL C program is required but writing to a 3D image in an OpenCL C program is optional. An embedded profile device may not support 3D images at all (reads and writes). To find out if the device supports 3D images (i.e., reading a 3D image in an OpenCL C program), query the CL_DEVICE_IMAGE3D_MAX_WIDTH
property using the clGetDeviceInfo
API. This will have a value of zero if the device does not support 3D images and a non-zero value otherwise.
OpenCL C programs that use the image3d_t
type will fail to build the program executable for an embedded profile device that does not support 3D images.
• Bilinear filtering for half-float and float images is not supported. Any 2D and 3D images with an image channel data type of CL_HALF_FLOAT
or CL_FLOAT
must use a sampler of CL_FILTER_NEAREST
. Otherwise the results returned by read_imagef
and read_imageh
are undefined.
• Precision of conversion rules when converting a normalized integer channel data type value to a single-precision floating-point value is different for the embedded and full profiles. The precision of conversions from CL_UNORM_INT8
, CL_UNORM_INT16
, CL_UNORM_INT_101010
, CL_SNORM_INT8
, and CL_SNORM_INT16
to float is <= 1.5 ulp
for the full profile and <= 2.0 ulp
for the embedded profile. Conversion of specific values, such as 0
→ 0.0f
, 255
→ 1.0f
, -127
and -128
→ -1.0f
, 127
→ 1.0f
are guaranteed to be the same for both profiles.
The required list of image formats (for reading and writing) that must be supported by an embedded profile device is given in Table 13.1.
The full profile supports built-in functions that perform atomic operations on 32-bit integers to global and local memory. These built-in functions are optional for the embedded profile. Check for the cl_khr_global_int32_base_atomics
, cl_khr_global_int32_extended_atomics
, cl_khr_local_int32_base_atomics
, and cl_khr_local_int32_extended_atomics
extensions in the list of extension strings reported by a device to see which functions, if any, are supported by the embedded profile device.
The mandated minimum single-precision floating-point capability for the full profile is CL_FP_ROUND_TO_NEAREST
| CL_FP_INF_NAN
. For the embedded profile, the mandated minimum capability is CL_FP_ROUND_TO_NEAREST
or CL_FP_ROUND_TO_ZERO
. Support for positive or negative infinity and NaN is not required.
If CL_FP_NAN
is not set, and one of the operands or the correctly rounded result of addition, subtraction, multiplication, or division is INF
or NaN
, the value of the result is implementation-defined. Likewise, single-precision comparison operators (<
, >
, <=
, >=
, ==
, !=
) return implementation-defined values when one or more operands is a NaN.
Conversions between different types (implicit and explicit) for the embedded profile are correctly rounded as described for the full profile, including those that consume or produce an INF
or NaN
.
Denormalized numbers for the half
data type, which may be generated when converting a float to a half (for example, using vstore_half
), or when converting from a half to a float (for example, using vload_half)
, may be flushed to zero by an embedded profile device. A full profile device, however, cannot flush these denorm values to zero.
The built-in math functions behave as described for the full profile, including edge case behavior (described in Section 7.5.1 of the OpenCL 1.1 specification). Table 13.2 describes the built-in math functions that differ in the minimum required accuracy between the full and embedded profiles.
This relaxation of the requirement to adhere to IEEE 754 requirements for basic floating-point operations, though extremely undesirable, is to provide flexibility for embedded and hand-held devices that have much stricter requirements on hardware area budgets.
Table 13.3 describes the differences in the mandated minimum maximum values for device properties (described in Table 4.3 of the OpenCL 1.1 specification).
The minimum maximum values for device properties related to images described in Table 13.3 apply only if the device supports images.
The embedded profile is a strict subset of the full profile. An OpenCL C program written for the embedded profile will work on any device that supports the full profile. There may be cases where the application may want to have separate code paths depending on which profile is supported by the device executing a kernel(s).
The __EMBEDDED_PROFILE__
macro is added to the OpenCL C language to determine whether a kernel is executing on an embedded profile or a full profile device. It is the integer constant 1
for devices that implement the embedded profile and is undefined otherwise.
3.145.40.189