Image processing with NPP

First, we will cover how the NPP library can ease an image-processing task. Before doing this, we should install the FreeImage library in order to be able to load and write a JPEG compressed image file easily. There are three options that can be used to prepare the library:

Installation from the Ubuntu archive:

$ sudo apt-get install libfreeimage-dev

Build from the source code and install:

$ wget http://downloads.sourceforge.net/freeimage/FreeImage3180.zip
$ unzip FreeImage3180.zip
$ cd FreeImage && make -j && sudo make install

Use the library that has already been installed with the CUDA Toolkit. An NPP sample code, 7_CUDALibraries/freeImageInteropNPP, in CUDA sample code uses the FreeImage library. For this sample, NPP header files and library files are installed at 7_CUDALibrires/common/FreeImage in the CUDA sample directory. You may use this if you prefer not to install other binaries into your machine.

Now, let's implement the NPP-based image-processing application. The fully implemented code is 05_npp/imageFilter.cpp. This file begins with the header files:

#include <iostream>
#include <iomanip>
#include <cassert>
#include <cstring>
#include <cuda_runtime.h>
#include <npp.h>
#include <FreeImage.h>
#include <helper_timer.h>

In this application, it has the ImageInfo_t structure to easily manage image information and data:

struct ImageInfo_t
{
    /* image information */
    FIBITMAP* dib; // FreeImage bitmap
    int nHeight;   // image height size
    int nWidth;    // image width size
    int nPitch;    // image pitch size
    int nBPP;      // Bit Per Pixel (i.e. 24 for BGR color)
    int nChannel;  // number of channels 
    BYTE* pData;   // bytes from freeimage library
   
    /* CUDA */
    Npp8u *pDataCUDA; // CUDA global memory for nppi processing
    int nPitchCUDA;   // image pitch size on CUDA device
};

Write the LoadImage() function in order to load a JPEG image. The FreeImage library supports any other image format, so you can try other images as you want. Then, we will fill the source image information managing structure with the loaded image data. The loadImage() function is implemented as follows:

void LoadImage(const char *szInputFile, ImageInfo_t &srcImage) {
    FIBITMAP *pSrcImageBitmap = FreeImage_Load(FIF_JPEG, szInputFile, JPEG_DEFAULT);
    if (!pSrcImageBitmap) {
        std::cout << "Couldn't load " << szInputFile << std::endl;
        FreeImage_DeInitialise();
        exit(1);
    }

    srcImage.dib = pSrcImageBitmap;
    srcImage.nWidth = FreeImage_GetWidth(pSrcImageBitmap);
    srcImage.nHeight = FreeImage_GetHeight(pSrcImageBitmap);
    srcImage.nPitch = FreeImage_GetPitch(pSrcImageBitmap);
    srcImage.nBPP = FreeImage_GetBPP(pSrcImageBitmap);
    srcImage.pData = FreeImage_GetBits(pSrcImageBitmap);
    assert(srcImage.nBPP == (unsigned int)24); // BGR color image
    srcImage.nChannel = 3;
}

Then, write some NPPI helper functions that provide the NPPI image size and the NPPI ROI size data from the image structure as follows:

NppiSize GetImageSize(ImageInfo_t imageInfo)
{
    NppiSize imageSize;
   
    imageSize.width = imageInfo.nWidth;
    imageSize.height = imageInfo.nHeight;
   
    return imageSize;
}
   
NppiRect GetROI(ImageInfo_t imageInfo)
{
    NppiRect imageROI;
   
    imageROI.x = 0;    imageROI.y = 0;
    imageROI.width = imageInfo.nWidth;
    imageROI.height = imageInfo.nHeight;
   
    return imageROI;
}

Then, let's implement the NPPI-based image resizing function as follows. In this function, we will use nppiResize_8u_C3R(), which was discussed at the beginning. NPP APIs have naming convention rules to explicitly clarify their operation. Depending on their functional categories, their naming starts with nppi for the image processing, and npps for the signal processing. For instance, an NPP image-processing function, nppiResize_8u_C3R(), begins with the nppi prefix, and it resizes input data with an unsigned char data type in three channels to the given ROI (you can learn more detail about this convention in the document):

int ResizeGPU(ImageInfo_t &dstImage, ImageInfo_t &srcImage, 
                 NppiSize &dstSize, NppiRect &dstROI, 
                 NppiSize &srcSize, NppiRect &srcROI, scale)
{
    // update output image size
    dstSize.width = dstROI.width = dstImage.nWidth;
    dstSize.height = dstROI.height = dstImage.nHeight;
 
    nppiResize_8u_C3R(srcImage.pDataCUDA, srcImage.nPitchCUDA, 
                      srcSize, srcROI, 
                      dstImage.pDataCUDA, dstImage.nPitchCUDA, 
                      dstSize, dstROI,
                      NPPI_INTER_LANCZOS);
    return 0;
}

To compare the performance with the CPU, we will use a FreeImage's function, as follows:

void ResizeCPU(const char* szInputFile, ImageInfo_t &dstImage) {
    FreeImage_Rescale(dib, dstImage.nWidth, dstImage.nHeight, FILTER_LANCZOS3);
}

Now, let's implement the main() function. At first, we should initialize the FreeImage library and load an image:

FreeImage_Initialise();
ImageInfo_t srcImage, dstImage;
LoadImage(szInputFile, srcImage);

Then, we will initialize the GPU memory space for the input image, as follows. In this procedure, we initialize the global memory space with an NPPI function and transfer the loaded image into the global memory using cudaMemcpy2D():

// copy loaded image to the device memory
srcImage.pDataCUDA = 
             nppiMalloc_8u_C3(srcImage.nWidth, srcImage.nHeight, 
                              &srcImage.nPitchCUDA);
cudaMemcpy2D(srcImage.pDataCUDA, srcImage.nPitchCUDA, 
             srcImage.pData, srcImage.nPitch, 
             srcImage.nWidth * srcImage.nChannel * sizeof(Npp8u), 
             srcImage.nHeight,
             cudaMemcpyHostToDevice);

After that, we will initialize the output memory space with the resized image size information as follows:

std::memcpy(&dstImage, &srcImage, sizeof(ImageInfo_t));
dstImage.nWidth *= scaleRatio;
srcImage.nHeight *= scaleRatio;
dstImage.pDataCUDA = 
                nppiMalloc_8u_C3(dstImage.nWidth, dstImage.nHeight, 
                                 &dstImage.nPitchCUDA);

Then, we call the ResizeGPU() and ResizeCPU() functions, which we have implemented already. For each operation, we will use cudaEvent to measure the execution time on the GPU:

RunNppResize(dstImage, srcImage, dstImageSize, dstROI, srcImageSize, srcROI, scaleRatio);
RunCpuResize(szInputFile, dstImage);

For verification, we will save the result to the file. To do this, we should create a FreeImage bitmap, and copy the resized image into the memory space. Then, we can save an output image, as follows:

// Save resized image as file from the device
FIBITMAP *pDstImageBitmap = 
                FreeImage_Allocate(dstImage.nWidth, dstImage.nHeight, 
                                   dstImage.nBPP);
  
dstImage.nPitch = FreeImage_GetPitch(pDstImageBitmap);
dstImage.pData = FreeImage_GetBits(pDstImageBitmap);
  
cudaMemcpy2D(dstImage.pData, dstImage.nPitch, 
             dstImage.pDataCUDA, dstImage.nPitchCUDA, 
             dstImage.nWidth * dstImage.nChannel * sizeof(Npp8u),
             dstImage.nHeight, cudaMemcpyDeviceToHost);
  
FreeImage_Save(FIF_JPEG, pDstImageBitmap, szOutputFile, JPEG_DEFAULT);

After that, we can finally terminate the related resources:

nppiFree(srcImage.pDataCUDA);
nppiFree(dstImage.pDataCUDA);
  
FreeImage_DeInitialise();

Compile the code using nvcc with the linked NPP and FreeImage library:

$ nvcc -run -m64 -std=c++11 -I/usr/local/cuda/samples/common/inc -gencode arch=compute_70,code=sm_70 -lnppc -lnppif -lnppisu -lnppig -lnppicom -lnpps -lfreeimage -o imageFilter ./imageFilter.cpp

As a result, when the scale factor is 0.5 f, the image size is reduced like this:

$ ls -alh *.jpg
-rw-rw-r-- 1 ubuntu ubuntu 91K Nov 13 22:31 flower.jpg
-rw-rw-r-- 1 ubuntu ubuntu 23K Nov 17 02:46 output.jpg

The measured elapsed time is 0.04576 ms using V100. Its time can vary depending on the GPU:

Rescale flower.jpg in 0.5 ratio.
CPU: 23.857 ms
GPU: 0.04576 ms
Done (generated output.jpg)

For more detail on the use of NPP for image processing, visit and see the linked document: http://on-demand.gputechconf.com/gtc/2014/presentations/HANDS-ON-LAB-S4793-image-processing-using-npp.pdf.

Table of Contents for Image processing with NPP

Create new playlist

Sign In

Sign Up

Table of Contents for
Image processing with NPP