Image processing with NPP

First, we will cover how the NPP library can ease an image-processing task. Before doing this, we should install the FreeImage library in order to be able to load and write a JPEG compressed image file easily. There are three options that can be used to prepare the library:

  1. Installation from the Ubuntu archive:
$ sudo apt-get install libfreeimage-dev

  1. Build from the source code and install:
$ wget http://downloads.sourceforge.net/freeimage/FreeImage3180.zip
$ unzip FreeImage3180.zip
$ cd FreeImage && make -j && sudo make install
  1. Use the library that has already been installed with the CUDA Toolkit. An NPP sample code, 7_CUDALibraries/freeImageInteropNPP, in CUDA sample code uses the FreeImage library. For this sample, NPP header files and library files are installed at 7_CUDALibrires/common/FreeImage in the CUDA sample directory. You may use this if you prefer not to install other binaries into your machine.

Now, let's implement the NPP-based image-processing application. The fully implemented code is 05_npp/imageFilter.cpp. This file begins with the header files:

#include <iostream>
#include <iomanip>
#include <cassert>
#include <cstring>
#include <cuda_runtime.h>
#include <npp.h>
#include <FreeImage.h>
#include <helper_timer.h>

In this application, it has the ImageInfo_t structure to easily manage image information and data:

struct ImageInfo_t
{
/* image information */
FIBITMAP* dib; // FreeImage bitmap
int nHeight; // image height size
int nWidth; // image width size
int nPitch; // image pitch size
int nBPP; // Bit Per Pixel (i.e. 24 for BGR color)
int nChannel; // number of channels
BYTE* pData; // bytes from freeimage library

/* CUDA */
Npp8u *pDataCUDA; // CUDA global memory for nppi processing
int nPitchCUDA; // image pitch size on CUDA device
};

Write the LoadImage() function in order to load a JPEG image. The FreeImage library supports any other image format, so you can try other images as you want. Then, we will fill the source image information managing structure with the loaded image data. The loadImage() function is implemented as follows:

void LoadImage(const char *szInputFile, ImageInfo_t &srcImage) {
FIBITMAP *pSrcImageBitmap = FreeImage_Load(FIF_JPEG, szInputFile, JPEG_DEFAULT);
if (!pSrcImageBitmap) {
std::cout << "Couldn't load " << szInputFile << std::endl;
FreeImage_DeInitialise();
exit(1);
}

srcImage.dib = pSrcImageBitmap;
srcImage.nWidth = FreeImage_GetWidth(pSrcImageBitmap);
srcImage.nHeight = FreeImage_GetHeight(pSrcImageBitmap);
srcImage.nPitch = FreeImage_GetPitch(pSrcImageBitmap);
srcImage.nBPP = FreeImage_GetBPP(pSrcImageBitmap);
srcImage.pData = FreeImage_GetBits(pSrcImageBitmap);
assert(srcImage.nBPP == (unsigned int)24); // BGR color image
srcImage.nChannel = 3;
}

Then, write some NPPI helper functions that provide the NPPI image size and the NPPI ROI size data from the image structure as follows:

NppiSize GetImageSize(ImageInfo_t imageInfo)
{
NppiSize imageSize;

imageSize.width = imageInfo.nWidth;
imageSize.height = imageInfo.nHeight;

return imageSize;
}

NppiRect GetROI(ImageInfo_t imageInfo)
{
NppiRect imageROI;

imageROI.x = 0; imageROI.y = 0;
imageROI.width = imageInfo.nWidth;
imageROI.height = imageInfo.nHeight;

return imageROI;
}

Then, let's implement the NPPI-based image resizing function as follows. In this function, we will use nppiResize_8u_C3R(), which was discussed at the beginning. NPP APIs have naming convention rules to explicitly clarify their operation. Depending on their functional categories, their naming starts with nppi for the image processing, and npps for the signal processing. For instance, an NPP image-processing function, nppiResize_8u_C3R(), begins with the nppi prefix, and it resizes input data with an unsigned char data type in three channels to the given ROI (you can learn more detail about this convention in the document):

int ResizeGPU(ImageInfo_t &dstImage, ImageInfo_t &srcImage, 
NppiSize &dstSize, NppiRect &dstROI,
NppiSize &srcSize, NppiRect &srcROI, scale)
{
// update output image size
dstSize.width = dstROI.width = dstImage.nWidth;
dstSize.height = dstROI.height = dstImage.nHeight;

nppiResize_8u_C3R(srcImage.pDataCUDA, srcImage.nPitchCUDA,
srcSize, srcROI,
dstImage.pDataCUDA, dstImage.nPitchCUDA,
dstSize, dstROI,
NPPI_INTER_LANCZOS);
return 0;
}

To compare the performance with the CPU, we will use a FreeImage's function, as follows:

void ResizeCPU(const char* szInputFile, ImageInfo_t &dstImage) {
FreeImage_Rescale(dib, dstImage.nWidth, dstImage.nHeight, FILTER_LANCZOS3);
}

Now, let's implement the main() function. At first, we should initialize the FreeImage library and load an image:

FreeImage_Initialise();
ImageInfo_t srcImage, dstImage;
LoadImage(szInputFile, srcImage);

Then, we will initialize the GPU memory space for the input image, as follows. In this procedure, we initialize the global memory space with an NPPI function and transfer the loaded image into the global memory using cudaMemcpy2D()

// copy loaded image to the device memory
srcImage.pDataCUDA =
nppiMalloc_8u_C3(srcImage.nWidth, srcImage.nHeight,
&srcImage.nPitchCUDA);
cudaMemcpy2D(srcImage.pDataCUDA, srcImage.nPitchCUDA,
srcImage.pData, srcImage.nPitch,
srcImage.nWidth * srcImage.nChannel * sizeof(Npp8u),
srcImage.nHeight,
cudaMemcpyHostToDevice);

After that, we will initialize the output memory space with the resized image size information as follows:

std::memcpy(&dstImage, &srcImage, sizeof(ImageInfo_t));
dstImage.nWidth *= scaleRatio;
srcImage.nHeight *= scaleRatio;
dstImage.pDataCUDA =
nppiMalloc_8u_C3(dstImage.nWidth, dstImage.nHeight,
&dstImage.nPitchCUDA);

Then, we call the ResizeGPU() and ResizeCPU() functions, which we have implemented already. For each operation, we will use cudaEvent to measure the execution time on the GPU:

RunNppResize(dstImage, srcImage, dstImageSize, dstROI, srcImageSize, srcROI, scaleRatio);
RunCpuResize(szInputFile, dstImage);

For verification, we will save the result to the file. To do this, we should create a FreeImage bitmap, and copy the resized image into the memory space. Then, we can save an output image, as follows:

// Save resized image as file from the device
FIBITMAP *pDstImageBitmap =
FreeImage_Allocate(dstImage.nWidth, dstImage.nHeight,
dstImage.nBPP);

dstImage.nPitch = FreeImage_GetPitch(pDstImageBitmap);
dstImage.pData = FreeImage_GetBits(pDstImageBitmap);

cudaMemcpy2D(dstImage.pData, dstImage.nPitch,
dstImage.pDataCUDA, dstImage.nPitchCUDA,
dstImage.nWidth * dstImage.nChannel * sizeof(Npp8u),
dstImage.nHeight, cudaMemcpyDeviceToHost);

FreeImage_Save(FIF_JPEG, pDstImageBitmap, szOutputFile, JPEG_DEFAULT);

After that, we can finally terminate the related resources:

nppiFree(srcImage.pDataCUDA);
nppiFree(dstImage.pDataCUDA);

FreeImage_DeInitialise();

Compile the code using nvcc with the linked NPP and FreeImage library:

$ nvcc -run -m64 -std=c++11 -I/usr/local/cuda/samples/common/inc -gencode arch=compute_70,code=sm_70 -lnppc -lnppif -lnppisu -lnppig -lnppicom -lnpps -lfreeimage -o imageFilter ./imageFilter.cpp

As a result, when the scale factor is 0.5 f, the image size is reduced like this:

$ ls -alh *.jpg
-rw-rw-r-- 1 ubuntu ubuntu 91K Nov 13 22:31 flower.jpg
-rw-rw-r-- 1 ubuntu ubuntu 23K Nov 17 02:46 output.jpg

The measured elapsed time is 0.04576 ms using V100. Its time can vary depending on the GPU:

Rescale flower.jpg in 0.5 ratio.
CPU: 23.857 ms
GPU: 0.04576 ms
Done (generated output.jpg)

For more detail on the use of NPP for image processing, visit and see the linked document: http://on-demand.gputechconf.com/gtc/2014/presentations/HANDS-ON-LAB-S4793-image-processing-using-npp.pdf.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.141.202