Copyright © 2011 NVIDIA Corporation and Wen-mei W. Hwu. All rights reserved.
Introduction
The State of GPU Computing in Video and Image Processing
GPUs have played a role in video and image processing for a long time. In the beginning they were used to display the processed results. Quickly, application developers picked up GPU computing, and GPUs are becoming the main processing devices in today's video- and image-processing applications. The ever-increasing amount of video and image data demands ever-increasing computational power while offering at the same time more potential for parallel computation. The GPU, with its many-core architecture, is the perfect match to these challenges and delivers the computational performance necessary to drive the image- and video-processing applications of today and the future.
GPU computing not only significantly speeds up existing workflows in video and image processing but also allows for more creativity by using the additional computational power to transform the workflows themselves. For example, with GPU computing, filters and operators can be performed in real time on full HD video, making low-resolution preview windows obsolete. At present, sophisticated video- and image-processing applications are only used off-line owing to long runtimes. When those applications take advantage of GPU computing, I expect even more breakthroughs. The transitioning of these applications into the real-time processing domain offers the opportunity for additional user interaction to be added, enabling the creation of new and more intelligent interactive tools in video- and image-processing applications.
In This Section
Chapter 34 , written by Temizel et al ., discusses different implementation choices for video-processing algorithms, including concurrent I/O operations, effective memory layout, and access, as well as kernel granularities, and gives general guidelines on achieving best performance.
Chapter 35 , written by St´ava and Benes, describes an implementation of the two-pass union-find algorithm to find connected components. The main concept is to first compute local solutions in shared memory and then merge these local results hierarchically to compute the final solution.
Chapter 36 , written by Fung and Stam, describes implementations of various image de-mosaicing algorithms. The key to high performance is using separate arrays for the different color channels in shared memory to avoid bank conflicts and letting one thread compute four output pixels to avoid divergences.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.69.163