Preface

I think it is useful to provide a little background as to why and how this book came into being. This will perhaps provide some insight into the way the material is structured, and why it is presented in the way that it is.

Background

Firstly, a little bit of history. I have an extensive background in image processing, particularly in the areas of image analysis, machine vision and robot vision, all strongly application-orientated areas. With over 25 years of applying image processing techniques to a wide range of problems, I have gained considerable experience in algorithm development. This is not only at the image processing application level but also at the image processing operation level. My approach to an application has usually been more pragmatic than theoretical – I have focussed on developing image processing algorithms that solved the problem at hand. Often this involved assembling sequences of existing image processing operations, but occasionally it required developing new algorithms and techniques to solve particular aspects of the problem. Through work on machine vision and robotics applications, I have become aware of some of the limitations of software-based solutions, particularly in terms of speed and algorithm efficiency.

This led naturally to considering FPGAs as an implementation platform for embedded imaging applications. Many image processing operations are inherently parallel and FPGAs provide programmable hardware, also inherently parallel. Therefore, it should be as simple as mapping one onto the other, right? Well, when I started implementing image processing algorithms on FPGAs, I had lots of ideas, but very little knowledge. I very soon found that there were a lot of tricks that were needed to create an efficient design. Consequently, my students and I learned many of these the hard way, through trial and error.

With my basic training as an electronics engineer, I was readily able to adapt to the hardware mindset. I have since discovered through observing my students, both at the undergraduate and postgraduate level, that this is perhaps the biggest hurdle to an efficient implementation. Image processing is traditionally thought of as a software domain task, whereas FPGA-based design is firmly in the hardware domain. To bridge the gap, it is necessary to think of algorithms not on their own but more in terms of their underlying computational architecture.

Implementing an image processing algorithm (or indeed any algorithm) on an FPGA, therefore, consists of determining the underlying architecture of an algorithm, mapping that architecture onto the resources available within an FPGA and finally mapping the algorithm onto the hardware architecture. Unfortunately, there is very little material available to help those new to the area to get started. Even this insight into the process is not actually stated anywhere, although it is implicitly followed (whether consciously or not) by most people working in this area.

Available Literature

While there are many research papers published in conference proceedings and journals, there are only a few that focus specifically on how to map image processing algorithms onto FPGAs. The research papers found in the literature can be classified into several broad groups.

The first focuses on the FPGA architecture itself. Most of these provide an analysis of a range of techniques relating to the structure and granularity of logic blocks, the routing networks and embedded memories. As well as the FPGA structure, a wide range of topics is covered, including underlying technology, power issues, the effects of process variability and dynamic reconfigurability. Many of these papers are purely proposals or relate to prototype FPGAs rather than commercially available chips. Although such papers are interesting in their own right and represent perfectly legitimate research topics, very few of these papers are directly useful from an applications point of view. While they provide insights into some of the features which might be available in the next generation of devices, most of the topics within this group are at too low a level.

A second group of papers investigates the topic of reconfigurable computing. Here the focus is on how an FPGA can be used to accelerate some computationally intensive task or range of tasks. While image processing is one such task considered, most of the research relates more to high performance (and high power) computing rather than low power embedded systems. Topics within this group include hardware and software partitioning, hardware and software co-design, dynamic reconfigurability, communication between an FPGA and CPU, comparisons between the performance of FPGAs, GPUs and CPUs, and the design of operating systems and specific platforms for both reconfigurable computing applications and research. Important principles and techniques can be gleaned from many of these papers, even though this may not be their primary focus.

The next group of papers is closely related to the previous group and considers tools for programming FPGAs and applications. The focus here is more on improving the productivity of the development process. A wide range of hardware description languages have been proposed, with many modelled after software languages such as C, Java and even Prolog. Many of these are developed as research tools, with very few making it out of the laboratory to commercial availability. There has also been considerable research on compilation techniques for mapping standard software languages to hardware. Such compilers attempt to exploit techniques such as loop unrolling, strip mining and pipelining to produce parallel hardware. Again, many of these papers describe important principles and techniques that can result in more efficient hardware designs. However, current compilers are still relatively immature in the level and kinds of parallelism that they can automatically exploit. They are also limited in that they can only perform relatively simple transformations to the algorithm provided; they cannot redesign the underlying algorithm.

The final group of papers focuses on a range of applications, including image processing and the implementation of both image processing operations and systems. Unfortunately, as a result of page limits and space constraints, many of these papers give the results of the implementation of various systems, but present relatively few design details. Often the final product is described, without describing many of the reasons or decisions that led to that design. Many of these designs cannot be recreated without acquiring the specific platform and tools that were used, or inferring a lot of the missing details. While some of these details may appear obvious in hindsight, without this knowledge many were far from obvious just from reading the papers. The better papers in this group tended to have a tighter focus, considering the implementation of a single image processing operation.

So while there may be a reasonable amount of material available, it is quite diffuse. In many cases, it is necessary to know exactly what you are looking for, or just be lucky to find it.

Shortly after beginning in this area, my research students and I wrote down a list of topics and techniques that we would have liked to have known when we started. As we progressed, our list grew. Our intention from the start was to compile this material into a book to help others who, like us, were having to learn things the hard way by themselves. Essentially, this book reflects our distilled experiences in this field, combined with techniques (both FPGA design and image processing) that have been gleaned from the literature.

Intended Audience

This book is written primarily for those who are familiar with the basics of image processing and want to consider implementing image processing using FPGAs. It accomplishes this by presenting the techniques and approaches that we wished we knew when we were starting in this area. Perhaps the biggest hurdle is switching from a software mindset to a hardware way of thinking. Very often, when programming software, we do so without great consideration of the underlying architecture. Perhaps this is because the architecture of most software processors is sufficiently similar that any differences are really only a second order effect, regardless of how significant they may appear to a computer engineer. A good compiler is able to map the algorithm in the programming language onto the architecture relatively efficiently, so we can get away without thinking too much about such things. When programming hardware though, architecture is everything. It is not simply a matter of porting the software onto hardware. The underlying hardware architecture needs to be designed as well. In particular, programming hardware usually requires transforming the algorithm into an appropriate parallel architecture, often with significant changes to the algorithm itself. This is not something that the current generation of compilers is able to do because it requires significant design rather than just decomposition of the dataflow. This book addresses this issue by providing not only algorithms for image processing operations, but also underlying architectures that can be used to implement them efficiently.

This book would also be useful to those who are familiar with programming and applying FPGAs to other problems and are considering image processing applications. While many of the techniques are relevant and applicable to a wide range of application areas, most of the focus and examples are taken from image processing applications. Sufficient detail is given to make many of the algorithms and their implementation clear. However, I would argue that learning image processing is more than just collecting a set of algorithms, and there are any number of excellent image processing texts that provide these. Imaging is a practical discipline that can be learned most effectively by doing, and a software environment provides a significantly greater flexibility and interactivity than learning image processing via FPGAs.

That said, it is in the domain of embedded image processing where FPGAs come into their own. An efficient, low power design requires that the techniques of both the hardware engineer and the software engineer be integrated tightly within the final solution.

Outline of the Contents

This book aims to provide a comprehensive overview of algorithms and techniques for implementing image processing algorithms on FPGAs, particularly for low and intermediate level vision. However, as with design in any field, there is more than one way of achieving a particular task. Much of the emphasis has been placed on stream-based approaches to implementing image processing, as these can efficiently exploit parallelism when they can be used. This emphasis reflects my background and experience in the area, and is not intended to be the last word on the topic.

A broad overview of image processing is presented in Chapter 1, with a brief historical context. Many of the basic image processing terms are defined and the different stages of an image processing algorithm are identified and illustrated with an example algorithm. The problem of real-time embedded image processing is introduced, and the limitations of conventional serial processors for tackling this problem are identified. High speed image processing must exploit the parallelism inherent in the processing of images. A brief history of parallel image processing systems is reviewed to provide the context of using FPGAs for image processing.

FPGAs combine the advantages of both hardware and software systems, by providing reprogrammable (hence flexible) hardware. Chapter 2 provides an introduction to FPGA technology. While some of this will be more detailed than is necessary to implement algorithms, a basic knowledge of the building blocks and underlying architecture is important to developing resource efficient solutions. The key features of currently available FPGAs are reviewed in the context of implementing image processing algorithms.

FPGA-based design is hardware design, and this hardware needs to be represented using some form of hardware description language. Some of the main languages are reviewed in Chapter 3, with particular emphasis on the design flow for implementing algorithms. Traditional hardware description languages such as VHDL and Verilog are quite low level in that all of the control has to be explicitly programmed. The last 15 years has seen considerable research into more algorithm approaches to programming hardware, based primarily on C. An overview of some of this research is presented, finishing with a brief description of a number of commercial offerings.

The process of designing and implementing an image processing application on an FPGA is described in detail in Chapter 4. Particular emphasis is given to the differences between designing for an FPGA-based implementation and a standard software implementation. The critical initial step is to clearly define the image processing problem that is being tackled. This must be in sufficient detail to provide a specification that may be used to evaluate the solution. The procedure for developing the image processing algorithm is described in detail, outlining the common stages within many image processing algorithms. The resulting algorithm must then be used to define the system and computational architectures. The mapping from an algorithm is more than simply porting the algorithm to a hardware description language. It is necessary to transform the algorithm to make efficient use of the resources available on the FPGA. The final stage is to implement the algorithm by mapping it onto the computational architecture.

Three types of constraints on the mapping process are: limited processing time, limited access to data and limited system resources. Chapter 5 describes several techniques for overcoming or alleviating these constraints. Possible FPGA implementations are described of several data structures commonly found in computer vision algorithms. These help to bridge the gap between a software and hardware implementation. Number representation and number systems are described within the context of image processing. A range of efficient hardware computational techniques is discussed. Some of these techniques could be considered the hardware equivalent of software libraries for efficiently implementing common functions.

The next section of this book describes the implementation of many common image processing operations. Some of the design decisions and alternative ways of mapping the operations onto FPGAs are considered. While reasonably comprehensive, particularly for low level image-to-image transformations, it is impossible to cover every possible design. The examples discussed are intended to provide the foundation for many other related operations.

Chapter 6 considers point operations, where the output depends only on the corresponding input pixel in the input image(s). Both direct computation and lookup table approaches are described. With multiple input images, techniques such as image averaging and background subtraction are discussed in detail. The final section in this chapter extends the earlier discussion to the processing of colour images. Particular topics given emphasis are colour space conversion, colour segmentation and colour balancing.

The implementation of histograms and histogram-based processing are discussed in Chapter 7. Techniques of accumulating a histogram and then extracting data from the histogram are described in some detail. Particular tasks are histogram equalisation, threshold selection and using histograms for image matching. The concepts of standard one-dimensional histograms are extended to multidimensional histograms. The use of clustering for colour segmentation and classification is discussed in some detail. The chapter concludes with the use of features extracted from multidimensional histograms for texture analysis.

Chapter 8 focuses considers a wide range of local filters, both linear and nonlinear. Particular emphasis is given to caching techniques for a stream-based implementation and methods for efficiently handling the processing around the image borders. Rank filters are described and a selection of associated sorting network architectures reviewed. Morphological filters are another important class of filters. State machine implementations of morphological filtering provide an alternative to the classic filter implementation. Separability and both serial and parallel decomposition techniques are described that enable more efficient implementations.

Image warping and related techniques are covered in Chapter 9. The forward and reverse mapping approaches to geometric transformation are compared in some detail, with particular emphasis on techniques for stream processing implementations. Interpolation is frequently associated with geometric transformation. Hardware-based algorithms for bilinear, bicubic and spline based interpolation are described. Related techniques of image registration are also described at the end of this chapter, including a discussion of the scale invariant feature transform and super-resolution.

Chapter 10 introduces linear transforms, with a particular focus on the fast Fourier transform, the discrete cosine transform and the wavelet transform. Both parallel and pipelined implementations of the FFT and DCT are described. Filtering and inverse filtering in the frequency domain are discussed in some detail. Lifting-based filtering is developed for the wavelet transform. This can reduce the logic requirements by up to a factor of four over a direct finite impulse response implementation. The final section in this chapter discusses the stages within image and video coding, and outlines some of the techniques that can be used at each stage.

A selection of intermediate level operations relating to region detection and labelling is presented in Chapter 11. Standard software algorithms for chain coding and connected component labelling are adapted to give efficient streamed implementation. These can significantly reduce both the latency and memory requirements of an application. Hardware implementaions of the distance transform, the watershed transform and the Hough transform are also described.

Any embedded application must interface with the real world. A range of common peripherals is described in Chapter 12, with suggestions on how they may be interfaced to an FPGA. Particular attention is given to interfacing cameras and video output devices, although several other user interface and memory devices are described. Image processing techniques for deinterlacing and Bayer pattern demosaicing are reviewed.

The next chapter expands some of the issues with regard to testing and tuning that were introduced earlier. Four areas are identified where an implementation might not behave in the intended manner. These are faults in the design, bugs in the implementation, incorrect parameter selection and not meeting timing constraints. Several checklists provide a guide and hints for testing and debugging an algorithm on an FPGA.

Finally, a selection of case studies shows how the material and techniques described in the previous chapters can be integrated within a complete application. These applications briefly show the design steps and illustrate the mapping process at the whole algorithm level rather than purely at the operation level. Many gains can be made by combining operations together within a compatible overall architecture. The applications described are coloured region tracking for a gesture-based user interface, calibrating and correcting barrel distortion in lenses, development of a foveal image sensor inspired by some of the attributes of the human visual system, the processing to extract the range from a time of flight range imaging system, and a machine vision system for real-time produce grading.

Conventions Used

The contents of this book are independent of any particular FPGA or FPGA vendor, or any particular hardware description language. The topic is already sufficiently specialised without narrowing the audience further! As a result, many of the functions and operations are represented in block schematic form. This enables a language independent representation, and places emphasis on a particular hardware implementation of the algorithm in a way that is portable. The basic elements of these schematics are illustrated in Figure P.1. I is generally used as the input of an image processing operation, with the output image represented by img.

Figure P.1 Conventions used in this book. Top left: representation of an image processing operation; middle left: a block schematic representation of the function given by Equation P.1; bottom left: representation of operators where the order of operands is important. Right: symbols used for various blocks within block schematics.

img

With some mathematical operations, such as subtraction and comparison, the order of the operands is important. In such cases, the first operand is indicated with a blob rather than an arrow, as shown on the bottom in Figure P.1.

Consider a recursive filter operating on streamed data:

(P.1) equation

where the subscript in this instance refers to the imgth pixel in the streamed image. At a high level, this can be considered as an image processing operation and represented by a single block, as shown in the top left of Figure P.1. The low level implementation is given in the middle left panel. The input and output, img and img, are represented by registers – dark blocks, with optional register names in white; the subscripts have been dropped because they are implicit with streamed operation. In some instances additional control inputs may be shown: CE for clock enable, RST for reset, and so on. Constants are represented as mid-grey blocks and other function blocks with light grey background.

When representing logic functions in equations, img is used for logical OR and img for logical AND. This is to avoid confusion with addition and multiplication.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.227.4