How might the image shown in Figure 7.1 be described? Clearly a number of objects can be recognized. Enthusiasts may recognize the breed of dog. Comments on such attributes as lighting, colour and composition could also be made. These aspects of visual perception are a consequence of the subtle processes of physiology and psychology operating when the two-dimensional luminance pattern contained in the image is viewed by a human observer. The processes involved are the subject of much current research and are not covered in this work.
When asked to comment on the quality of the image, the same complex processes are involved and what is known as a subjective judgement is made. The processes operate on identifiable physical properties of the image, such as the structure of edges and the distribution of tones. The measurement of these physical properties forms the basis of objective image evaluation (see Chapter 19).
The purpose of this chapter is to introduce and develop the Fourier theory of image formation. This theory allows many objective measures of image quality to be unified and the evaluation and comparison of different imaging systems in a meaningful way. Most textbooks covering this subject require some familiarity with physics and mathematics. The presentation here includes a liberal sprinkling of important mathematical expressions but it is not necessary to understand their manipulation in order to understand the ideas of the chapter. In most cases the mathematics is included as an illustration of, and a link to, the more thorough treatments found elsewhere.
Returning to Figure 7.1, the image may be usefully described as a large array of small, closely spaced image elements (pixels) of different luminances and colours. If the pixels are sufficiently small, they are not noticed. All that is seen is a continuous luminance pattern. The way adjacent pixels vary in luminance over a small region determines image qualities such as sharpness, resolution and the appearance of noise, or in the case of photographic processes, graininess (see Chapter 24).
The idea may be extended to scenes in general. What lies in front of the camera lens when an image is captured is a continuous luminance pattern, which can be considered as an infinite array of infinitesimal points of varying luminance. The manner in which the imaging process translates these scene points (the input) into the image elements (the output) is then a fundamental physical process contributing to the quality of the image.
The physical relationship between input and output clearly depends on the technology of the particular imaging process. For example, the basic image element from a photographic film is best described as a localized group of developed silver grains, referred to as the point spread function. A charge-coupled device (CCD) or complementary metal oxide semiconductor (CMOS) imaging array produces more recognizable discrete cells (pixels) related to the parameters of the detector.
Despite these technological differences, the Fourier theory of image formation can be applied, with certain conditions, to all imaging systems. It enables the modelling of the relationship of the input to output in a general manner. It supplies an important tool, the modulation transfer function (MTF), with which important aspects of imaging performance can be assessed. The theory can be applied quite generally to colour images by treating individual channels independently. From this point forward we will consider the theory as applied to a luminance channel only.
To understand the Fourier theory of image formation requires a special treatment of images and their structure. The luminance pattern is decomposed, not into points or image elements, but into waves (see Chapter 2) or, more specifically, to spatial frequencies.
In Chapter 2 we discussed in-depth simple harmonic motion, so reference to that chapter may be useful in understanding the properties of sinusoidal waves. Figure 7.2a shows an amplified sine wave plotted as a function of distance, x. It exhibits the characteristic periodic behaviour where the number of repetitions (cycles) in unit distance is known as the spatial frequency, u, usually measured in cycles per mm (mm−1). The amplitude of the wave is a. The function is derived from the familiar trigonometric ratio by setting the angle equal to 2πux radians and multiplying the resulting ratio by the required amplitude.
Figure 7.2a also shows a cosine wave of the same spatial frequency and amplitude. The two waves differ only in their positions on the x-axis. If the cosine wave is shifted a distance π/2 in the positive x direction it becomes a sine wave.
A sine or cosine wave, shifted an arbitrary distance along the x-axis, is termed a phase-shifted sine wave.
If we add a constant to a phase-shifted sine wave to produce an all-positive result, we obtain what will be referred to in this work as a sinusoidal wave (Figure 7.2b). This is appropriate for our work where the varying quantity is, for example, luminance.
For much of the work in image formation, phase is unimportant; however, when considering the analysis and measurement of sampled systems in particular, it can have an effect on results.
The modulation of a sinusoidal wave is a formal measure of contrast. It is defined in Eqn 7.1 and shown in Figure 7.3. The modulation of a wave must lie between the two limits, zero and one. A modulation of zero implies the wave does not exist; a modulation of one is the maximum possible without invoking negative luminance and a will be equal to b.
where M(u) is the modulation of the sinusoidal wave with spatial frequency u, a is the amplitude and b is the mean signal level.
Figure 7.4a shows a sinusoidal luminance pattern. It is characterized by a particular spatial frequency, a direction of variation (x direction) and a modulation. Figure 7.4b shows the same information plotted as a three-dimensional figure. The luminance value of the image at any point is plotted vertically. Figure 7.4c is an end-on view. It shows the profile and clearly illustrates the sinusoidal character.
Figure 7.5 shows a second sinusoidal luminance pattern, of a lower spatial frequency and a lower modulation. Adding these two sinusoidal patterns together, Figure 7.6 is obtained. The result is no longer sinusoidal. The profile shows a slightly more complicated pattern.
This process may be continued by adding many more sinusoidal patterns of varying modulation and frequency. Figure 7.7 shows an image containing 30 harmonics, all of which vary in the x direction but have different frequencies and modulations. The profile is now very complicated and does not appear to be periodic, unlike the individual sinusoidal components. Note that in all of the above cases the images are invariant in the y direction (i.e they vary only in the x direction). They are described as functions of x only.
Figure 7.8 shows the result of adding more sinusoidal patterns, but this time their direction of variation is vertical (y direction). Finally, some sinusoidal patterns that vary in directions other than the horizontal or vertical may be added. Figure 7.9 shows two examples, and the result of adding them to the image. A complicated image is now created that varies in both the x and y directions.
This demonstration could be continued to include many more sinusoidal components. Some very complicated patterns could be created. With a large enough variety of components, it may be argued that any pattern could be created. In other words, any image can in principle be built up (or synthesized) from sine and cosine wave components. Waves of many different frequencies, amplitudes, orientations and phases would need to be included. The French mathematician Jean Baptiste Joseph Fourier (1768–1830) first demonstrated this principle, although he dealt with functions of one variable rather than two.
It follows from the discussion that real scenes can, in principle, be decomposed into sets of sinusoidal components of varying frequencies and modulations. The basis of Fourier theory of image formation is that if the performance of the imaging system to different sinusoidal inputs is known, the principle of superposition (adding together – see Chapter 2) may then be utilized to determine the performance for a real scene.
The mathematical process known as the Fourier transform supplies the means of decomposing arbitrary functions into their sinusoidal components. This is a mathematical transformation of a function of space (or time), f(x) into a different form, F(u), a function of spatial (or temporal) frequency that represents the amplitudes of the contained frequencies u. The term Fourier transform is commonly used to represent both the process and the resulting function F(u).
Consider an image of a sinusoidal pattern made using an ‘ideal’ imaging system. Imagining the wave to be made up of an infinite number of infinitesimally small points, then an ideal imaging system is one that is capable of reproducing every point faithfully. The image will then be an exact copy. Furthermore, we would know that each region in the image, no matter how small, corresponds to an identical region in the original.
Now, as was suggested at the beginning of this chapter, real imaging processes are not ideal. Points in the original are not reproduced as points in the image. Depending on the particular imaging process an object point may appear in the image as a fuzzy circular patch of varying density (photographic film), small concentric rings of light (a diffraction-limited lens – see Chapters 2, 6 and 10), a uniform rectangular pixel (digital system), etc. These examples are illustrated in Figure 7.10a–c. It is customary to refer to these basic image elements as point spread functions (PSFs). The designers of imaging systems (photographic emulsion chemists, optical engineers and solid-state physicists) are constantly striving to reduce the size of the PSFs for their systems.
Of major importance is the fact that images are formed from the summation of an infinite number of overlapping PSFs. This is fairly easy to visualize for the photographic and optical systems. In the case of the typical digital process a sampling mechanism (see later in the chapter) confuses the process, although overlapping PSFs are still effectively at work. This mechanism of image formation (summation of overlapping PSFs) is an example of a very important process known as convolution (explained later in the chapter). The image is essentially a convolution of the luminance distribution present in the original scene with the PSF of the imaging system.
This process results in images that are blurred when compared with the original scene.
A remarkable fact when imaging a sinusoidal pattern is that the result retains its sinusoidal form, even though the convolution process has blurred it. There will generally be a change (usually a reduction) in modulation, and there may be a phase shift (see Figure 7.12). The reduction in modulation depends on the spatial frequency of the luminance pattern and the size of the PSF. An important consequence of this is that the reproduction of sinusoidal patterns can be described very simply. Usually the reduction in modulation is all that is required. Furthermore, the results can be presented in clear graphical form (the modulation transfer function) and imaging systems can be meaningfully and usefully compared.
In order to use the theory, the imaging system must be linear and spatially invariant.
Suppose a general input I1 causes an output O1 at a particular position in the image, and a different input I2 causes an output O2 at the same position. If an input k1I1 + k2I2 causes an output k1O1 + k2O2 where k1 and k2 are any arbitrary constants, the system is said to be linear. A consequence of linearity is that the output vs. input characteristic will be a straight line, i.e. input is proportional to output.
If the only change caused to an image by shifting the input an arbitrary distance is an equivalent shift in the image, then the system is said to be spatially invariant (or stationary).
In practice, most imaging systems violate both of these conditions. The photographic process is notoriously nonlinear, with well-known characteristic curves (see Chapter 8) linking the output density with the input exposure. It is worth noting, though, that as long as the subject exposure falls on the straight-line portion of the characteristic curve, the photographic process can be considered to be linear. CCD and CMOS arrays are not spatially invariant. On a micro scale, moving a point source across a single detector will not change the detector output. The sampling process (see later), a feature of these arrays, acts as a special kind of non-linearity generating new spatial frequencies in the image. Lenses will not be stationary if they contain aberrations, which is generally the case.
In spite of these violations, most imaging systems can be treated as quasi-linear and stationary over a restricted operating region to take advantage of the wealth of tools available in Fourier theory.
Spread functions arise from important fundamental physical processes acting in real imaging systems. A few of these are described in the following paragraphs.
Lens systems are limited by light diffraction (Chapter 2) and often suffer from aberrations (Chapters 6 and 10). These effects spread the light laterally in the image. Even in the case of a ‘perfect’ lens (one with no aberrations), the wave nature of light causes a point to appear as a disc surrounded by fainter rings (the Airy disc).
A photographic emulsion causes light passing into it to be diffused. It is said to be turbid. This diffusion arises as a result of reflection, refraction, diffraction and scattering by the silver halide crystals, and depends on such factors as the mean crystal size, ratio of silver halide to gelatin and the opacity of the emulsion to actinic light.
A CCD or CMOS imaging array integrates the light falling over the area of a single detector element. Light is effectively spread over a rectangular area defined by the dimensions of the element. In some cases this is modified by the addition of filters designed to increase the light-gathering ability of the elements or reduce aliasing (see later in this chapter). Further spreading can occur as a result of charge diffusion, charge transfer inefficiencies and other electrical processing operating within the device.
Cathode ray tube (CRT) displays can be considered to exhibit a point spread function in the form of a display dot generated by the scanning electron beams. The profile of the dot is usually Gaussian and in colour systems is sampled by a shadow mask. The spread function of liquid crystal displays (LCDs) is defined largely by the size of the liquid crystal polarizing filters used to contribute the red, green and blue components of the display signal to each discrete pixel.
Image processing and compression systems contribute significantly to modern imaging systems, particularly in the areas of digital stills cameras (DSCs), digital video, digital versatile discs (DVDs), high definition (HD) television and many others. In many cases these processing algorithms can significantly modify the point spread function that would otherwise be exhibited by the imaging hardware in isolation. Therefore, these software components may also be thought of as contributing a point spread function to the system and their effects should be considered when evaluating imaging devices.
It has already been noted that point spread functions are the ‘building blocks’ of real images and will be responsible for the degradations in image quality (sharpness, resolution, etc.) that occur in imaging systems. It has also been seen that degradation in image quality can be explained by reduction in modulation of sinusoidal components, summarized by the modulation transfer function. As we shall see, it is another remarkable fact of Fourier theory that these two quite different views of image formation are intimately connected. In particular, the point spread function and the modulation transfer function are related by the Fourier transform.
The image of a point of intensity in a linear, stationary imaging system is the point spread function (PSF). It is a function of two orthogonal variables (x, y), usually taken in the same directions as the image plane variables xp and yp. If the system is isotropic (i.e. it has the same physical properties in all directions), the PSF will be rotationally symmetrical. It can thus be represented by a function of one variable, r say, where r2 = x2 + y2.
Notation: |
I(x, y) general representation, suitable for optical systems, CCDs, CMOS, monitor screens, scanners, algorithms, etc. |
Units: |
Light intensity (optical systems). Voltage, equivalent to effective exposure (CCD and CMOS devices). Luminance (monitor screens). Effective exposure (photographic emulsions). |
Generally, no specific units are referred to here. Intensity is used here as a generic term for luminance, density, pixel value, etc. It is assumed that output units are linear with input units. Care must be taken when examining any system involving digital values (alternatively named integers, counts or pixel values) as input or output. Digital values must be mapped to an input or output as part of the quantization process and this can be highly non-linear. Without this information, digital values do not relate to any real (physical) quantities. Also, the significance of the values for two devices can be markedly different. Put another way, a pixel value of 255 from an 8-bit digital camera may be ‘white’ with an optical density of 0.02, whereas from another 16-bit camera it could very well be ‘dark grey’ with an optical density of 2.5. A pixel value means nothing until that value is put into context.
The shape of the PSF (in particular its extent in the x and y directions) determines the sharpness and resolution aspects of image quality produced. If the PSF is very narrow in the x and y directions, the image sharpness and resolution may be expected to be good (a hypothetical perfect system would have a PSF of zero width – i.e. a point). A system with a PSF extending over a large area (very wide) would produce very poor images.
The intensity profile of the image of a line (a function of just one variable, or one orientation) is the line spread function (LSF). It is formed from the summation of a line of overlapping point spread functions. The LSF of a system is thus obtained by integrating the PSF in one direction (one dimension). If the system is not isotropic the LSF will depend on the orientation of the line. If the system is isotropic, the LSF is independent of orientation. In this case the LSF contains all the information that the PSF does. Since the LSF is much easier to measure and use than the PSF, imaging is nearly always studied using the LSF of the system.
Figure 7.11 shows the relationship between the LSF and the PSF for a typical diffusion-type imaging process (for example, a photographic emulsion). It should be noted, however, that the diagram applies to all linear, stationary imaging systems. Only the shapes of the spread functions will differ.
The intensity profile of the image of an edge (i.e. one-dimensional cross-section of the edge) is the edge spread function (ESF). It can be considered as formed from a set of parallel line spread functions finishing at the position of the edge. The value of the ESF at any point is thus given by the integral of a single LSF up to that point. Since the reverse relationship must apply, it is found that the first derivative of the ESF gives the LSF. In other words, the slope of the edge profile at any point is the value of the LSF at that point. Imaging an edge is therefore a very important method of obtaining the LSF of a system.
It has been seen that the input to an imaging system can be thought of as a two-dimensional array of very close points of varying intensity. Provided the system is linear and stationary the image may be considered to be formed from the addition of overlapping, scaled PSFs in the x and y directions. The image would be expected to be less detailed than the original input distribution and the amount of degradation will depend on the width of the PSF of the system.
The input scene may be denoted as Q(xp,yp), the output image as Q′(xp,yp) and the PSF as I(x, y). Mathematically, the relationship between them is given by the imaging equation:
This is a two-dimensional convolution integral. It represents the previously described process of adding up the scaled PSFs over the surface of the image. The integrals represent the summation process when the individual PSFs are infinitesimally close together.
It is often written as:
or even just:
where ⊗ denotes convolution. Note also the dropping of subscripts on x and y. In all that follows, subscripts will be dropped if there is no danger of ambiguity.
Using the LSF instead of the PSF allows us to write a one-dimensional simplification:
where L(x) is the line spread function, Q(x) is the one-dimensional input and Q′(x) is the one-dimensional output. In this case, input scenes of the form illustrated in Figure 7.7 are considered as an infinitesimally close set of lines. The image is formed from the addition of overlapping weighted line spread functions.
This may also be written as:
or as:
In an earlier section we highlighted an important property of linear, stationary systems. If the input is sinusoidal in form, then the output is also sinusoidal in form. It will have a different modulation (Eqn 7.1 – usually reduced) and may be shifted along the x-axis. The image will have the same frequency as the input. This is illustrated in Figure 7.12.
The degree of modulation reduction depends on the spatial frequency. Low frequencies (corresponding to coarse detail in the original) suffer only a minor reduction in modulation (if at all). High frequencies suffer much greater loss of modulation. If the frequency is high enough, it will not be reproduced at all (i.e. it is not resolved).
The reduction in modulation of a particular spatial frequency, u, is known as the modulation transfer factor. A plot of the modulation transfer factor against spatial frequency, u, gives the modulation transfer function (MTF). Some typical MTF curves for black-and-white photographic films are illustrated in Figure 7.13a and for typical CCD or CMOS devices in Figure 7.13b with a 9 μm pixel.
The MTF describes the reduction in modulation, or contrast, occurring in a particular imaging system, as a function of spatial frequency. Note that it does not include information about any ‘phase shift’, i.e. if any frequencies undergo a shift on imaging owing to an asymmetric spread function, this will not be shown by the MTF.
In Figure 7.13a, the curve for the fast panchromatic film shows an initial rise above unity at low spatial frequencies. This is evidence of development adjacency effects and represents a non-linearity of the system. Accordingly the MTF curve is not unique. It is of limited value as it does not represent the behaviour of the film for all input exposure distributions. Similarly, the curve for the typical commercial digital stills camera in Figure 7.13b exhibits a rise above unity at low spatial frequencies. In this case it is due to sharpening filters present in the system. Again, this renders the MTF curve dependent on the input, as typically the degree to which sharpening is applied is linked to the ‘strength’ of the edge presented to the system. Therefore, there is not a unique MTF curve for this system. Sharpening filters are discussed in more detail in Chapters 24, 27 and 28.
In order to complete this introduction to the Fourier theory of image formation, an introduction to the main tools for decomposing functions into their component spatial frequencies is needed, the Fourier series and Fourier transform.
It has been shown that the addition of enough sine and cosine waves of various frequencies, modulation (or amplitude) and phase will enable the reproduction of any signal desired. Extending this argument, taking any signal it should be possible to decompose it into its component sine and cosine waves. This is attractive because, as has been shown earlier, it is easier to understand the effect of a system on simple functions like sine waves rather than complex signals like a real scene.
Consider the waveform shown in Figure 7.14a. It is composed of the waves sin(ux), sin(2ux) and sin(3ux) and is periodic (it repeats itself) (Figure 7.14b–d). The frequency of the first sine wave is u. The frequencies of the two additional sine waves are integer multiples of it. The amplitudes of the sine waves are 0.5, 0.4 and 0.3 respectively. The function for this may be written:
where u = 2, a1 = 0.5, a2 = 0.4 and a3 = 0.3. The frequency of the first sine wave is the lowest and is known as the fundamental frequency. Because the frequencies of the second and third sine waves are multiples of the first, they are known as harmonics. The method above is a perfectly acceptable way of describing the signal shown in Figure 7.14a. In reality, however, signals are far more complex than in the example and can contain many component sine waves, hundreds if not thousands. Using the above method to describe the signal is inefficient and tedious, and therefore Eqn 7.8 may be rewritten:
where u = 2, a1 = 0.5, a2 = 0.4, a3 = 0.3 and an>3 = 0. Though this is a nice compact form, there is no way to alter the phase and a constant offset is also needed to make the result positive as before. The phase may be changed by adding differing ratios of cosine and sine waves for each frequency. To add phase and a constant offset we may rewrite the above as:
This is the Fourier series. Unique sets of coefficients a0, an and bn will describe unique periodic signals. Using Euler’s formula it is possible to rewrite the addition of sine and cosine:
where i is the imaginary unity ( i.e. complex numbers theory).
Therefore, it is possible to rewrite Eqn 7.10 in the complex form:
where cn is a coefficient, related to an and bn via: an = cn + c − n and bn = i(cn − c − n).
It is important that the less mathematically inclined reader see Eqn 7.12 only as representing the addition of sinusoidal waves written with mathematical shorthand rather than anything more complicated.
For a non-periodic function f(x), the Fourier transform F(u) is defined as:
F(u) represents the amount of frequency u present in the non-periodic f(x). Equation 7.13 can be interpreted as follows. To establish the amount of a frequency to present, the function f(x) is multiplied by a sine or cosine of that frequency. The area of the result (denoted by the integral) yields the required amount in terms of amplitudes.
F(u) is called the Fourier spectrum (or sometimes just the spectrum) of f(x). F(u) is generally a continuous frequency spectrum.
The units of the variable u are the reciprocal of those of x. f(x) is a function of distance (e.g. x in mm) and therefore F(u) is a function of spatial frequency (mm−1), i.e. cycles per mm.
The function F(u) is in general a complex function. This means it has two distinct components at each frequency, u: the real component R(u) represents the amplitude of the cosine component and the imaginary component I(u) represents the amplitude of the sine component. Using the notation of complex mathematics, this is written:
The total amplitude of the frequency u is given by the modulus of the Fourier transform:
One of the most important functions for which we need the Fourier transform is also one of the simplest functions, namely:
where ‘rect’ stands for rectangular and a is a constant representing the width of the rectangle. The function is thus a rectangular ‘pulse’ of width a and height 1. Its importance lies in the fact that it is used to represent many important imaging apertures (the width of a scanning slit, the one-dimensional transmittance profile of a lens, a single detector element of a CCD or CMOS imaging array, etc.). When the Fourier transform expression is evaluated for this function, the following result is obtained:
where the function sinc(x) function (see also Chapter 2) is defined as:
This Fourier transform pair is illustrated in Figure 7.15. As a further example, a rectangular function of width 5 units is presented together with its Fourier transform in Figure 7.16. In both of these cases the rectangular functions are symmetrical about the origin (even function). This means that they contain only cosine waves. The Fourier transforms therefore represent the amplitudes of the cosine components present in the rectangular functions.
The Dirac delta function, δ(x), is a special function in Fourier mathematics. It is widely used in signal processing and has a particular significance in image science.
The Dirac delta function is defined with two statements:
The first statement says that δ(x) is zero everywhere except at the origin. The second statement says that δ(x) has an area of unity. The only reasonable interpretation of this definition is to suppose that δ(x) has infinitesimal width but infinite height, so that its width times height = 1.
The function is usually interpreted as representing an impulse of unit energy. It is shown graphically as a vertical arrow of unit height situated at the origin. The height of the arrow represents the energy (or area) of the impulse.
This means an infinitely brief pulse of energy contains an infinite range of frequencies, all of the same amplitude. In imaging terms it implies that a perfect point image contains an infinite range of frequencies of constant amplitude. δ(x) and its transform are illustrated in Figure 7.17.
The convolution of any function f(x) with δ(x) yields f(x), i.e.
• δ(x) is used to represent a ‘point’ of unit magnitude input to a system. The image of the point is the point spread function (PSF). In spatial frequency terms, the input is a flat (or ‘white’) spectrum; the output is the MTF.
• If a hypothetical system has a perfect response, its PSF is identical to δ(x). Output images will then be identical to the input exposure (the sifting property).
• The process of sampling a continuous function uses a row, or comb, of delta functions to represent the sampling function. Sampled data values are obtained by multiplying the continuous function by the sampling function. Digital images are obtained using a two-dimensional sampling function.
For a quick analytical approximation of point and line spread functions, a Gaussian may be used of the form:
where σ determines the width of the function. Its usefulness is not only because the function is included as standard in a lot of curve-fitting packages, but that its Fourier transform is another Gaussian, termed a reciprocal function:
Therefore, if the line spread function of the system under measurement lends itself easily to description using a Gaussian, the MTF may readily be approximated. The Fourier transform pair is shown in Figure 7.18.
Starting with the imaging equation in one dimension:
the image of a one-dimensional sinusoidal wave of frequency u is obtained as follows:
Let
This is a sinusoidal function with a modulation equal to a/b. The imaging equation can be developed to arrive at:
This says that the output is also a one-dimensional sinusoidal wave, of the same frequency u but different modulation, namely aM(u)/b. The symbol ε represents a phase shift (this will be zero if the line spread function is symmetrical). The modulation transfer factor, M(u), for frequency u is given by:
where M(u)Out is output modulation and M(u)In is input modulation. A plot of M(u) against u is the MTF. From Eqns 7.25 and 7.26 the important result below is found:
where T(u) is called the optical transfer function (OTF), i.e. the modulation transfer function is the modulus of the Fourier transform of the line spread function.
Fourier theory is thus seen to unify two apparently independent image models: the array of points and the sum of sinusoidal components. This and other important relationships are presented in Figures 7.19 and 7.20.
A system comprising a chain of linear processes, each with its own MTF, has a system MTF, Ms(u), given by:
where Mn(u) is the MTF of the nth process. In other words, the individual MTFs for the components of an imaging chain combine by simple multiplication to yield the system MTF. This result follows directly from a consideration of the ‘image of an image’.
Convolution is such an important process in linear systems theory that it is useful to be familiar with the following geometrical interpretation. The convolution of two functions f(x) and g(x) yields a third function h(x), where:
where ξ is the ‘dummy’ variable of the integration.
It can be considered an averaging process between the two functions f and g. Given the two functions f and g shown in Figure 7.21, the convolution process involves flipping g left to right as shown to form g(−ξ) and then sliding g(−ξ) across f(ξ). At each position, x, the two functions are multiplied together. The area of the result is the value of the convolution at this position x. As we have seen, the two-dimensional version of this process is the basis of imaging in a linear, stationary imaging system.
The Fourier transform of a convolution of two functions is the product of the Fourier transforms of those two functions, i.e. if F(u) is the Fourier transform of f(x), G(u) is the Fourier transform of g(x), and H(u) is the Fourier transform of h(x), a function obtained by the convolution of f(x) with g(x), then:
The convolution of two functions has a Fourier transform given by the product of the separate transforms of the two functions. This is a very important result and is the basis of all system analyses using frequency–response curves.
In the previous sections it has been seen how important Fourier theory is in understanding the formation and properties of images formed in imaging systems. For example, the modulation transfer function (MTF), describing the attenuation of amplitude as a function of spatial frequency, can be obtained by taking the Fourier transform of the line spread function. Fourier space multiplication (corresponding to real space convolution) forms the basis of many established techniques in image processing (see Chapter 28).
In most cases the functions we wish to transform are not available in an analytical form, so the integral definition of the Fourier transform cannot be applied. Instead, we generally have some experimentally determined function f(x) that is sampled at regular spatial intervals of x (i.e. δx), to yield a finite set of numbers (fi say) that represent the function. The Fourier transform is taken of this discrete set of values (using a discrete Fourier transform or DFT). The result is another finite set of numbers (Fj) that represents the required Fourier transform F(u) at regular intervals along the spatial frequency axis.
Correct sampling is essential if the digital values are to properly represent the continuous functions they are derived from. The set fi must obviously be taken at an interval δx small enough to resolve all the detail in f(x). Note the wording here. It is necessary to resolve all the detail present, even if we are not interested in the fine structure. The reason for this will become clear shortly.
In a similar manner, the set Fj must be determined by the DFT at frequency intervals of δu sufficiently close to display the details of the function F(u).
Conditions for correct sampling are embodied in the well-known sampling theorem. If we choose our sampling interval δx according to the sampling theorem, then the results will be as accurate as the measurement noise will allow. If we undersample (i.e. with too large a sampling interval), the results will be wrong. The unwanted process of aliasing will have occurred.
The rest of this section deals with the sampling theorem and aliasing. The topic is introduced by first considering the process of undersampling a cosine wave. We then define the sampling function and apply the convolution theorem to derive the sampling theorem and to explain the phenomenon of aliasing.
Figure 7.22 illustrates the consequence of sampling a waveform using a sampling interval that is too great. Curve (a) shows the original cosine wave being sampled. This has a spatial frequency of 40 cycles per mm. The sampling interval is 30 mm and the sample points are shown as spikes. The resulting set of sampled values appears to come from a much lower frequency, as shown by curve (b). This is the reconstructed frequency and in this case has a value of 6.67 cycles per mm.
This process of reconstructing a lower frequency than the one that was sampled (the input frequency) is known as aliasing. It occurs to a greater or lesser extent in all digital systems and means that a digitized image will not be a true record of the original analogue (continuous) image. The next two sections investigate aliasing and its relation to the sampling conditions in more detail.
An infinitely extending row of impulses (regularly displaced Dirac delta functions) is represented by:
This is known as a Dirac comb, or Shah function, and in our application represents the sampling function. The impulses are of unit energy and are spaced a distance δx (i.e. the sampling interval is δx). The Fourier transform of a Dirac comb is another Dirac comb with inverse scaling (interval equal to 1/δx).
The sampling function and its Fourier transform are shown in Figure 7.23.
The process of sampling a continuous function f(x) is represented by the multiplication of the function f(x) by a Dirac comb. By the convolution theorem, multiplication in distance space is equivalent to convolution in frequency space. F(u) is replicated at intervals 1/δx. The separate repetitions are known as aliases. The DFT will evaluate the quantity F(u) ⊗ III(uδx) over just one period (usually between u = 0 and u = 1/δx). The result is correct (i.e. equivalent to F(u)) provided the individual aliases do not overlap. However, if δx is not fine enough, overlap will occur, as shown in Figure 7.24. The DFT gives the sum of the aliases and in this case the result is incorrect as shown by the dashed line. Aliasing is said to have occurred.
In order to avoid aliasing, the function f(x) must be sampled at an interval δx such that:
where uc is the maximum significant frequency contained by f(x). The frequency, uN, defined as:
is known as the Nyquist frequency. It represents the highest frequency in the original function (i.e. signal) that can be faithfully reconstructed.
The sampling theorem states that all input frequencies below the Nyquist frequency can be unambiguously recovered, while input frequencies above the Nyquist frequency will be aliased. Such aliased frequencies are reconstructed in the frequency range 0–uN, their amplitudes being added to the true signal in that range.
A formal enunciation of the sampling theorem is: a function f(x) whose Fourier transform is zero for |u| > uc is fully specified by values spaced at equal intervals δx not exceeding 1/2uc.
Note that f(x) is assumed to be band limited, i.e. there is an upper limit to the range of frequencies contained by f(x).
The sampling theorem implies that it is possible to recover the intervening values of f(x) with full accuracy! In practice, the DFT will evaluate (i.e. sample) the quantity F(u) ⊗ III(uδx) at intervals δu, which is itself determined from the sampling theorem, i.e. δu = 1/S, where S is the total extent, in units of x, of the function f(x). It is equivalent to replicating the sampled function f(x)III(x/δx) at intervals S, although only actually one period is handled. This is illustrated in Figure 7.25.
Suppose f(x) is sampled correctly, at intervals δx, over the range S. N data values will be produced, where N = S/δx. The DFT will produce N1 values, representing one alias of the Fourier transform sampled at intervals δω:
i.e. N input values to the DFT will yield N output values. Finally, note that DFTs are often calculated using a so-called ‘fast Fourier transform’ or FFT method. It is essential that N be a power of 2 (16, 32, 64, etc.). If necessary, digitized functions of x must have sufficient zeroes added to the ends to satisfy this requirement (zero padding).
One important example of the correct use of the sampling theorem occurs in the slanted edge method for determining the MTF of CCD image arrays. Figure 7.26 shows a schematic representation of such an array. The detector elements have a width a in the x direction. The centre-to-centre separation (the sampling interval) is δx.
The MTF in the horizontal direction will be determined mainly by the element width a. In fact, the line spread function for this aspect is given by:
and the corresponding MTF is sinc(au). Other degradations, such as charge diffusion, will contribute to the MTF in a less significant manner.
A study of the sinc function reveals that spatial frequencies up to 1/a and beyond are capable of being recorded. The sampling interval δx will generally be greater than a, depending on the structure of the device (individual detector elements cannot physically overlap). This means that the Nyquist frequency cannot be higher than 1/2a. We therefore have a situation of undersampling and the potential for much aliasing of the higher image frequencies. Readers may be familiar with moiré patterns visible in images formed with CCD or CMOS systems. These are the consequence of aliasing of fine periodic structure. Many CCD-type imaging systems (for example, film scanners) have anti-aliasing mechanisms built in. One of the simplest is to ensure that the preceding imaging lens has an MTF that will filter out frequencies above the Nyquist frequency of the detector.
To measure the horizontal MTF without aliasing corrupting the result we use the slanted-edge method, introduced and detailed in Chapter 24.
The non-stationary nature of sampled devices can cause further errors when evaluating MTF despite an absence of aliasing. The relationship of the phase of the signal to each sample will cause the recorded modulation to vary (Figure 7.27). The consequence of this is that, unless using a method such as above, MTF will vary with small changes in the position of the test target used. It can be shown that the upper bounds of the MTF will be given by the sinc function described earlier and the lower bounds may be estimated by:
where p represents the pixel pitch. This corresponds to a line spread function equivalent to two neighbouring pixels.
An in-depth account of MTF measurements of imaging systems and how the MTF summarizes image quality attributes that contribute to such subjective impressions as sharpness and resolution are given in Chapter 24.
Bracewell, R.N., 1999. The Fourier Transform and its Applications, third ed. McGraw-Hill, New York, USA.
Castleman, K.R., 1996. Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA.
Gleason, A., (translator), et al., 1995. Who Is Fourier? A Mathematical Adventure. Transnational College of LEX Blackwell Science, Oxford, UK.
Gonzalez, R.C., Woods, R.E., Eddins, S.L., 2004. Digital Image Processing Using MATLAB. Pearson Prentice-Hall, New Jersey, USA.
Goodman, J.W., 1996. Introduction to Fourier Optics (Electrical and Computer Engineering), second ed. McGraw-Hill, New York, USA.
Hecht, E., 1987. Optics, second ed. Addison-Wesley, Reading, MA, USA.
Jacobson, R.E., Ray, S.F., Attridge, G.G., Axford, N.R., 2000. The Manual of Photography, ninth ed. Focal Press, Oxford, UK.
Proudfoot, C.N., 1997. Handbook of Photographic Science and Engineering, second ed. IS&T, Springfield, VA, USA.
18.117.104.17