Low/mixed precision operations

Mixed precision is a technique for exploring low-precision, and obtains a high accuracy result. This technique computes core operations with low precision and generates output with high-precision operations. Low precision operation computation has the benefits of reduced memory bandwidth and higher computing throughput compared with high-precision computing. If low precision suffices to get target accuracy from an application with high precision, this technique can benefit performance with this trade-off. NVIDIA Developer Blog introduces this programmability: https://devblogs.nvidia.com/mixed-precision-programming-cuda-8.

In these circumstances, CUDA extends its supports to low-precision tools lower than 32-bit data types, such as 8/16-bit integers (INT8/INT16) and 16-bit floating points (FP16). For those low-precision data types, a GPU can use single instruction, multiple data (SIMD) operations with some specific APIs. In this section, we will look at these two kinds of instructions for low-precision operations for a mixed-precision purpose.

To get benefits from this, you need to confirm that your GPU can support low mixed-precision operations and supporting data types. Supporting low-precision computing is possible in specific GPUs, and the precision varies depending on the GPU chipsets. To be specific, GP102 (Tesla P40 and Titan X), GP104 (Tesla P4), and GP106 support INT8; and GP100 (Tesla P100) and GV100 (Tesla V100) support FP16 (half-precision) operations. The Tesla GV100 is compatible with INT8 operation and has no performance degradation.

CUDA has some special intrinsic functions that enable SIMD operations for low-precision data types.

Table of Contents for Low/mixed precision operations

Create new playlist

Sign In

Sign Up

Table of Contents for
Low/mixed precision operations