6 Multisensor Image Fusion

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6Multisensor Image Fusion

Multisensor fusion treats information data supplied by various sensors. It can provide more comprehensive, accurate, and robust results than that obtained from a single sensor. Fusion can be defined as combined processing of the data acquired from multiple sensors, as well as assorting, optimizing, and conforming this data to increase the ability to extract information and to improve decision capability. Fusion can extend the coverage for space and time information, reduce fuzziness, increase the reliability of making decisions, and improve the robustness of systems.

Image fusion is a particular type of multisensor fusion, which takes images (include video frames) as operating objects. In a more general sense, the combination of multiresolution images can be considered a special type of fusion process. In this chapter, however, the emphasis is on information fusion of multisensor images.

The sections of this chapter are arranged as follows:

Section 6.1 provides an overview of the information fusion and describes the concepts of information types, fusion layers, and the active fusion.

Section 6.2 introduces the main steps of image fusion, three fusion layers, and the objective and subjective evaluation methods of fusion results.

Section 6.3 presents several basic methods of pixel-level fusion and some common techniques for combining them. A number of different examples of pixel-level fusion are provided.

Section 6.4 discusses the technical principles of the three tools or methods used in feature-level fusion and decision-level fusion, including Bayesian, evidential reasoning, and rough set theory.

6.1Overview of Information Fusion

The human perception of the outside world is the result of the interaction between the brain and many other organs, in which not only visual information but also much nonvisual information play a role. For example, the intelligent robots currently under investigation have different sensors for viewing, hearing, olfaction (the sense of taste), gestation (the sense of smell), the sense of touch (the sense of pain), the sense of heat, the sense of force, the sense of slide, and the sense of approach (Luo, 2002). All these sensors provide different profile information of a scene in the same environment. There are many correlations among the information of sensor data. To design suitable techniques for combining information from various sensors, theories and methods of multisensor fusion are required.

6.1.1Multisensor Information Fusion

Multisensor information fusion is a basic ability of human beings. Single sensors can only provide incomplete, inaccurate, vague, and uncertain information. However, information obtained by different sensors can even be contradictory. Human beings have the ability to combine the information obtained by different organs and then make estimations and decisions about the environment and events. Using a computer to perform multisensor information fusion can be considered a simulation of the function of the human brain for processing complex problems.

6.1.1.1Layers of Information Fusion

There are several schemes for classifying information fusion into various layers. For example, according to the information abstraction level, information fusion can be classed into five layers (the main considerations here are the strategic early warning in battlefields, in which C3I-Command, Control, Communication, and Information systems are required) (He, 2000):

Fusion in the Detection Layer is the fusion directly in the signal detection level of multisensors. That is, the signal detected by a single sensor is first preprocessed before transmitting it to the center processor.

Fusion in the Position Layer is the fusion of the output signals of each sensor. Fusion here includes both time fusion and space fusion. From time fusion, the object’s states can be obtained. While from space fusion, the object’s moving trace can be obtained.

Fusion in the Object Recognition Layer Object recognition is used to classify objects according to their attributes and/or properties. Fusion in the object recognition layer can be performed in three ways.

(1)Decision fusion: Fusing the classification results of each sensor.

(2)Feature fusion: Fusing the feature description vectors of each sensor.

(3)Data fusion: Fusing the original data of each sensor.

Fusion in the Posture Evaluation Layer tries to analyze the whole scene based on object recognition. This requires the combination of various attributes of objects, events, etc., to describe the action in a scene.

Fusion in the Menace Estimation Layer Posture estimation emphasizes state while menace estimation emphasizes tendency. In the menace estimation layer, not only the state information is taken into account, but also the appropriate a priori knowledge should be used to estimate the tendency of the state changes and the results of possible events.

6.1.1.2Discussions about Active Vision Systems

It is well known that static, single image analysis constitutes an ill-posed problem. One reason is that the reconstruction of a 3-D scene from a 2-D image is underdetermined. Interpreting and recovering information from one single image has been the goal of many image-understanding systems since the 1980s. Researchers have tried to attain their goals by implementing artificial intelligence techniques (expert vision systems, modeling of domain knowledge, and modeling of image analysis knowledge). In the later of 1980s, a new paradigm called “active perception” or “active vision” was introduced (Aloimonos, 1992) and then extended to “active, qualitative, purposive” vision (Andreu, 2001). The main ideas behind these concepts are that the ill-posed problem of general vision can be well defined and solved easily under the following conditions:

(1)If there is an active observer taking more than one image of the scene.

(2)If the “reconstructionist” metric approach (see the discussion in Section 1.3.3) to vision is relaxed to a qualitative one, where it is sufficient to state, for example, that object A is closer to the observer than object B.

(3)If, instead of general vision, a well-defined narrow purpose of the vision system is modeled (leading to a particular solution of a specific application problem).

(4)If any combination of these three conditions are met.

Let us consider condition (1) from a new perspective. If there is an active vision system taking more than one image of a scene, or even more general, if there are moving observers or objects in the scene and the system is equipped with several sensors, then the essential problem that has to be solved is how to integrate multiple information from multiple sensors taken at different moments. Information to be fused can be imperfect in many ways (wrong, incomplete, vague, ambiguous, and contradictory). Mechanisms are required to:

(1)Select information from different sources;

(2)Combine information into a new aggregated state;

(3)Register spatially and temporally visual information;

(4)Integrate information at different levels of abstraction (pixel level, feature level, object level, symbol level, etc.).

6.1.1.3Active Fusion-Based Understanding

Figure 6.1 shows a general schema of an image understanding system, with a special emphasis (boldface, bold arrows) on the role of fusion within the schema. The upper half corresponds to the real-world situation, while the lower half reflects its mapping in the computer. Line boxes denote levels of processing and dashed line boxes denote levels of representation, respectively. Solid arrows represent the data flow and dashed ones represent the control flow.

The process of fusion combines information, actively selects the sources to be analyzed, and controls the processes to be performed on these data, which is called active fusion. Fusion can take place at isolated levels (e.g., fusing several input images producing an output image) or integrate information from different representational levels (e.g., generate a thematic map from a map, digital elevation model, and image information). Processing at all levels can be requested and controlled (e.g., selection of input images, choice of classification algorithms, and refinement of the results in selected areas).

Figure 6.1: The framework of general image understanding based on active fusion.

6.1.2Sensor Models

Information fusion is based on information captured from different sensors, so the models of the sensor play an important role.

6.1.2.1Fusion of Multisensor Information

The advantages of fusing multisensor information include the following four aspects:

(1)Using multisensors to detect the same region can enhance reliability and credibility.

(2)Using multisensors to observe different regions can increase spatial coverage.

(3)Using multi-types of sensors to examine the same object can augment information quantity and reduce the fuzziness.

(4)Using multisensors to observe the same region can improve spatial resolution.

In fact, when multisensors are used, even if some of them meet problems, other sensors can still capture environment information, so the system will be more robust. Since several sensors can work at the same time, processing speed can be increased, efficiency of utilization can be augmented, and the cost of information capturing will be reduced.

Corresponding to the forms of multisensor information fusion, the information obtained from the outside can be classified into three types:

Redundant Information is the information about the same characteristics in the environment, captured by several independent sensors (often the same modality). It can also be the information captured by one sensor but at different times. Redundant information can improve the tolerance and reliability of a procedure. Fusion of redundant information can reduce the uncertainty caused by noise and increase the precision of the system.

Complementary Information is the information about different characteristics in the environment, captured by several independent sensors (often the different modality). By combining such information, it is possible to provide complete descriptions about the environment. Fusion of complementary information can reduce the ambiguity caused by lacking certain features and enhance the ability to make correct decisions.

Cooperation Information is the information of several sensors, from which other sensors can be used to capture further information. Fusion of such information is dependent on the time sequence that different sensors use.

6.1.2.2Several Sensor Models

Sensor model is an abstract representation of physical sensors and their processes of information processing. It describes not only its own properties but also the influence of exterior conditions on the sensor and the ability of interactions among different sensors.

The probability theory can be used to model the multisensor fusion system Durrant (1988). Denote the observation value of sensor yi, the decision function based on the observation value Ti, and the action of decision ai, then

$a_{i} = T_{i} (y_{i}) (6.1)$ $a_{i} = T_{i} (y_{i}) (6.1)$

Now, consider a multisensor fusion system as a union of a set of sensors. Each sensor can be represented by an information structure Si, which includes the observation value of this sensor yi, the physical state of this sensor xi, the a priori probability distribution function of this sensor pi, and the relation between the actions of this sensor and other sensors, which is given by

$y_{i} = S_{i} (x_{i}, p_{i}, a_{1}, a_{2}, \dots, a_{i - 1}, a_{i + 1}, \dots a_{n}) (6.2)$ $y_{i} = S_{i} (x_{i}, p_{i}, a_{1}, a_{2}, \dots, a_{i - 1}, a_{i + 1}, \dots a_{n}) (6.2)$

So the information structure of the set of sensors can be represented by n groups of S = (S1, S2, ..., Sn). If denoting the decision function T = (T1, T2, ..., Tn), the goal of information fusion is to obtain a consistent decision a, based on a set of sources, which describes the environment characteristic better than any single decision ai (i = 1,2,..., n).

If considering separately the actions of different parts on yi, (i.e., consider the conditional probability density function) three sub-models, the state model $S_{i}^{x},$ $S_{i}^{x},$ the observation model $S_{i}^{p},$ $S_{i}^{p},$ and the correlation model, $S_{i}^{T}$ $S_{i}^{T}$ can be obtained. In other words, eq. (6.2) can be written as

$S_{i} = f (y_{i} | x_{i}, p_{i}, T_{i}) = f (y_{i} | x_{i}) f (y_{i} | p_{i}) f (y_{i} | T_{i}) = S_{i}^{x} S_{i}^{p} S_{i}^{T} (6.3)$ $S_{i} = f (y_{i} | x_{i}, p_{i}, T_{i}) = f (y_{i} | x_{i}) f (y_{i} | p_{i}) f (y_{i} | T_{i}) = S_{i}^{x} S_{i}^{p} S_{i}^{T} (6.3)$

State Model: A sensor can change its space coordinates and/or internal state over time (e. g., the displacement of a camera or the adjustment of the lens focus). The state model describes the dependency of observation values of a sensor on the location/state of this sensor. This corresponds to the transform of different coordinate systems for the sensor. That is, the observation model of the sensor fi(yi|pi) and the correlation model fi(yi|Ti) are transformed to the current coordinate system of the sensor by using the state model fi(yi|xi).

Observation Model: The observation model describes the measurement model, in case where the position and state of a sensor as well as the decision of other sensors are known. The exact form of fi(yi|pi) is based on many physical factors. To simplify, a Gaussian model is used

$\begin{array}{l} S_{i}^{p} = f (y_{i} | p_{i}) = \frac{1 - e}{{(2 π)}^{m / 2} | W_{1 i} |^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - p_{i})}^{T} W_{1 i}^{- 1} (y_{i} - p_{i})] \\ + \frac{e}{{(2 π)}^{m / 2} | W_{2 i} |^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - p_{i})}^{T} W_{2 i}^{- 1} (y_{i} - p_{i})] (6.4) \end{array}$ $\begin{array}{l} S_{i}^{p} = f (y_{i} | p_{i}) = \frac{1 - e}{{(2 π)}^{m / 2} | W_{1 i} |^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - p_{i})}^{T} W_{1 i}^{- 1} (y_{i} - p_{i})] \\ + \frac{e}{{(2 π)}^{m / 2} | W_{2 i} |^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - p_{i})}^{T} W_{2 i}^{- 1} (y_{i} - p_{i})] (6.4) \end{array}$

where 0.01 < e < 0.05 and the variance is |W1i| $< <$ $< <$ |W2i|. This model indicates that most of the time the observation values of a sensor fit to a Gaussian distribution of a mean pi and a variance W1i, but rarely do the observation values of a sensor fit to a Gaussian distribution of a mean pi and a variance W2i.

Correlation Model: The correlation model describes, with the help of the correlation among sensors, the influence of other sensors to the current sensor

$S_{i}^{T} = f (y_{i} | T_{i}) = f [y_{i} | T_{i} (y_{1}), \dots, T_{i - 1} (y_{i - 1}), T_{i + 1} (y_{i + 1}), \dots, T_{n} (y_{n})] (6.5)$ $S_{i}^{T} = f (y_{i} | T_{i}) = f [y_{i} | T_{i} (y_{1}), \dots, T_{i - 1} (y_{i - 1}), T_{i + 1} (y_{i + 1}), \dots, T_{n} (y_{n})] (6.5)$

In eq. (6.5), yi is the observation values of the i-th sensor to the parameter pi, Tj (j ≠ i) represents a priori knowledge provided by other sensors to the i-th sensor, related to pi. If these are described as a priori knowledge by the probability distribution functions f(Ti), the correlation model can be considered the posterior probability of the observation value yi, when the a priori f(Ti) is known

$f_{i} (T_{i}) = f_{i} (T_{n} | T_{i}, \dots, T_{n - 1}) f_{i} (T_{n - 1}) | T_{1}, \dots, T_{n - 2}) \dots f_{i} (T_{2} | T_{1}) f_{i} (T_{1}) (6.6)$ $f_{i} (T_{i}) = f_{i} (T_{n} | T_{i}, \dots, T_{n - 1}) f_{i} (T_{n - 1}) | T_{1}, \dots, T_{n - 2}) \dots f_{i} (T_{2} | T_{1}) f_{i} (T_{1}) (6.6)$

where each conditional probability fi(Tj|Tk) represents the information contribution from the j-th sensor to the i-th sensor, when the information provided by the k-th (k = [1 ~ (j − 1)]) sensor was known.

6.2Image Fusion

This section will focus on the fusion of image and video information. There are many modalities for capturing image and video, which use various sensors and techniques, such as visible light sensor (CCD, CMOS), infrared sensor, depth sensor, a variety of computer tomography techniques (CT, ECT, SPECT), magnetic resonance imaging (MRI), synthesis aperture radar (SAR), millimeter wave radar (MMWR), etc.

6.2.1Main Steps of Image Fusion

For image fusion, many image techniques will be used, which include three steps:

6.2.1.1Image Preprocessing

The image preprocessing procedure includes image normalization (gray-level equilization, resampling, and interpolation), image filtering, color enhancement, edge sharpening, etc. Image fusion may be carried out among images of different sizes, different resolutions, and different dynamic ranges of the gray-levels or colors. Image normalization is used to normalize these parameters. Image filtering is used to filter, with high-pass filters, a higher resolution image to obtain its high-frequency texture information, so as to keep these information in fusion with a lower resolution image. Image color enhancement is used to increase the color contrast in a lower resolution image, so as to reflect the spectrum information in the fused image. Edge sharpening is performed on a high-resolution image to make the boundary clear and to reduce noise, thus it can fuse the space information from a high-resolution image to a low-resolution image.

6.2.1.2Image Registration

Image registration is used to align different images in a (spatial and/or temporal) space. The techniques presented in Chapter 5 for image matching can be referenced again. Image fusion has a high requirement for an accurate registration. If the registration error is higher than one pixel, then the fused results will show a superposition effect and the visual quality of the image will be greatly reduced.

Image registration can be classified as relative registration and absolute registration. The relative registration takes one image from many images of the same category as a reference image, while other images will be aligned relatively to this reference image. Absolute registration takes the space coordinate system as the reference system, and fused images will be aligned relative to this system.

Image registration can also be classified as region-based registration and feature-based registration. The control point is a typical feature used in featurebased registration. There are a number of image registration methods based on control points. The correspondence between control points is first determined, and the registration process then can be carried out with the determined parameters. To obtain both correspondence points and registration parameters, the general Hough transform (GHT) can be applied. It can be considered the evidence accumulation method, and once a mathematic model is set up, it can derive the parameters. The global search space depends on scale and rotation parameters, which can be very huge. To reduce the complexity of GHT, the iterative Hough transform (IHT) can be used. However, IHT is influenced by the initial parameters and the range of the parameter values, often converged to local maximum. By using the Hough transform in a multiresolution decomposition environment, as shown in Figure 6.2, the robustness of GHT and the computation efficiency of MIHT can be combined (Li, 2005a).

In multiresolution decomposition-based techniques, a few control points are used in the low-resolution layer in which GHT is used to obtain accurately the initial values of the transform parameters. While in high resolution, IHT is used to accelerate the process.

6.2.1.3Image Fusion

After image preprocessing and registration, information fusion can be carried out. Quantitative fusion is used to fuse a group of data to obtain consistent data, which is a conversion from data to data. Qualitative fusion is used to fuse many single decisions to form a combined decision, which is a conversion from several uncertainty representations to a relatively coherent representation. Quantitative fusion is often used to deal with the information represented by numeric values while qualitative fusion is mainly used to deal with the information represented by non-numeric values. Some concrete techniques will be introduced in the following sections.

6.2.2Three Layers of Image Fusion

Multisensor image fusion can be split into three layers. They are, from low to high, pixel-based fusion layers, feature-based fusion layers, and decision-based fusion layers (Polhl 1998). In recent years, the development tendency of fusion is going from pixels to regions (objects) (Piella, 2003).

Figure 6.2: Framework for image registration using the control point-based multiscale Hough transform.

Figure 6.3: Flowchart of multisensor image fusion.

The flowchart of multisensor image fusion with three layers is illustrated in Figure 6.3. There are three steps from capturing an image to making judgment and then making a decision. They are feature extraction, object recognition, and decision formation in the fusion procedure. The three layers of image fusion correspond to these three steps. Pixel-based fusion is made before the feature extraction step, feature-based fusion is made before the object recognition step, and decision-based fusion is made before the decision formation.

6.2.2.1Pixel-Layer Fusion

textbfPixel-layer fusion is conducted in the lowest layer, the data layer. It operates directly on the captured images and produces a single fused image. Pixel-layer fusion provides the basis for a higher layer of fusion. One advantage of pixel-layer fusion is that it will keep as much original information as possible, so the precision obtained is higher than the other two fusions. The main disadvantages are the huge information to be processed and the high computation cost. Moreover, pixel-layer fusion often requires that the data to be fused are captured by the same type or similar type of sensors.

6.2.2.2Feature-Layer Fusion

Feature-layer fusion is conducted in the middle layer. It needs to extract features, obtain scene information, and integrate them to provide more believable decisions. Feature-layer fusion not only keeps the important information but also compresses the data volume. It is suitable for sensors of different types. The advantage of feature-layer fusion is that it deals with less data than pixel-layer fusion does. Therefore, it is more suitable for real-time processing. The main disadvantage of feature-layer fusion is that its precision is worse than pixel-layer fusion.

6.2.2.3Decision-Layer Fusion

Decision-layer fusion is conducted in the highest layer, often performed with the help of symbolic computation. It makes directly the optimal decision according to the reliability of each decision, as each process unit has already finished the tasks of object classification and recognition. The advantage of decision-layer fusion is that it has the properties of high tolerance, opening, and real-time capability. The main disadvantage of decision-layer fusion is that the information has already been lost before the fusion, so the precision, either in time or in space, would be inferior to that of the other two fusions.

The principal properties of the three fusion modes discussed above are listed in Table 6.1.

For different fusion modes, the techniques used are often different. In Table 6.2, some typical techniques used in the three fusion modes are listed. Several of them will be introduced and discussed in the following two sections.

6.2.3Evaluation of Image Fusion

The evaluation of fusion results is an important research task of image fusion. The evaluation of fusion results in different layers requires different methods and criteria. For lower layers of fusion, the results are judged more from the appearance of visual effects, while for higher layer fusions, the more interesting aspect is the help of the fusion to complete the task. Though different techniques have been used, the common objectives are to improve the visual quality and/or enrich the information quantity of the fused image.

Table 6.1: Principal properties of three fusion modes.

Table 6.2: Commonly used fusion techniques in the three fusion modes.

Pixel-Layer Fusion	Feature-Layer Fusion	Decision-Layer fusion
Weighted average	Weighted average	Knowledge based
Pyramid fusion	Bayesian method	Bayesian method
HSI transformation	Evidence reasoning (D–S theory)	Evidence reasoning (D–S theory)
PCA operation	Neuron network	Neuron network
Wavelet transformation	Cluster analysis	Fuzzy set theory
High-pass filtering	Entropy	Reliability theory
Kalman filtering	Vote based	Logical module
Regression model		Production rule
Parameter estimation		Rough set theory

There are certain differences in the evaluation of fusion results and the evaluation of distortion between the image to be coded and the image decoded (see Section 6.1.2 of Volume I in this book set). In coding evaluation, the factor taken into account is the difference between the input and the output. The smaller the difference (smaller distortion), the better the result. In image fusion, the input can be two or even more images, and the result is not merely related to the difference between the fused image and each of the input images.

The evaluation of image fusion can be classified into two groups, subjective evaluation and objective evaluation. The former is based on the subjective feeling of the observers, which could be different for different observers according to their personality and interests, as well as for different application domains. The latter is based on some computable criteria, which may or may not be closely related to the subjective judgment.

Some basic evaluation criteria are introduced below, in which f(x, y) denotes the original image, g(x, y) denotes the fused image, and their sizes are N × N. Both f(x, y) and g(x, y) are assumed to be gray-level images; however, the criteria obtained can be easily extended to treat other types of images.

6.2.3.1Subjective Evaluation

Subjective evaluation is often carried out by viewing the fused image. In practice, the following observations on the fused image are considered:

The Precision of Image Registration: In the case that the precision of registration is low, the fused image would show some overlapped areas.

The Distribution of Global Color Appearance: If the distribution consists of natural colors, the fused image would be considered actual.

The Global Contrast of Brightness and Color: When the global contrast of brightness and color is out of place, mist or speckle would appear.

The Abundance of Texture and Color Information: If some spectrum and spatial information were lost in the fusion process, the fused image would appear watery or pale.

The Definition of Fused Image: The decrease in the definition would make the edges in the fused image blurry or faint.

6.2.3.2Objective Evaluation Based on Statistics

Objective evaluation of fusion results can be based on the statistics of images:

Average Gray-Level: The average gray level of an image reflects the mean brightness perceived by human eyes and thus influences the visual effect. The average gray level of an image is computed by

$μ = \frac{1}{N \times N} \sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} g (x, y) (6.7)$ $μ = \frac{1}{N \times N} \sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} g (x, y) (6.7)$

If the average gray level of an image were moderate, pleasant visual effects would be obtained.

Standard Deviation: The standard deviation of gray levels of an image reflects the dispersing grade of each gray-level relative to the mean brightness. It can be used to estimate the contrast of the fused image. The standard deviation of the gray level of an image is computed by

$σ = \frac{1}{N \times N} \sqrt{\sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} {[g (x, y) - μ]}^{2}} (6.8)$ $σ = \frac{1}{N \times N} \sqrt{\sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} {[g (x, y) - μ]}^{2}} (6.8)$

The smaller the standard deviation, the smaller the contrast. In the case where an image has a very small standard deviation, the image looks pale and provides very little information to observers.

Average Gradient: The average gradient of gray levels of an image is similar to the standard deviation of gray levels, as both reflect the contrast of the image. Since the computation of gradients is often carried out locally, the average gradient reflects more about the micro change and the texture property of local regions in an image. One formula for computing the average gradient of the fused image is

$A = \frac{1}{N \times N} \sqrt{\sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} [G_{x}^{2} (x, y) + G_{Y}^{2} {(x, y]}^{1 / 2}} (6.9)$ $A = \frac{1}{N \times N} \sqrt{\sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} [G_{x}^{2} (x, y) + G_{Y}^{2} {(x, y]}^{1 / 2}} (6.9)$

where GX(x, y) and GY(x, y) are the gradients of g(x, y) along the X and Y directions, respectively. Images with large values of the average gradient are often more clear than those with small values of the average gradient. The latter has fewer layers than the former does.

Gray-Level Disparity: The gray-level disparity between the fused image and the original image reflects the difference between their spectrum information (some call it a spectrum warp). The gray-level disparity is computed by

$D = \frac{1}{N \times N} \sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} \frac{| g (x, y) - f (x, y) |}{f (x, y)} (6.10)$ $D = \frac{1}{N \times N} \sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} \frac{| g (x, y) - f (x, y) |}{f (x, y)} (6.10)$

The small value of the gray-level disparity indicates that the fused image has well kept the gray-level information from the original images.

Mean Variance: If the ideal image (the best-fused image) is available, the evaluation can be carried out with the help of the mean variance between the ideal image and the fused image. Denote the ideal image i(x, y), and the mean variance between the ideal image and the fused image is computed by

$E_{rms} = {\frac{1}{N \times N} \sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} {[g (x, y) - i (x, y)]}^{2}}^{1 / 2} (6.11)$ $E_{rms} = {\frac{1}{N \times N} \sum_{x = 0}^{N - 1} \sum_{y = 0}^{N - 1} {[g (x, y) - i (x, y)]}^{2}}^{1 / 2} (6.11)$

6.2.3.3Objective Evaluation Based on Information Quantities

Objective evaluation of fusion results can be based on the information theories of images:

Entropy: The entropy of an image is an index of information quantity in the image, and it can be calculated from the histogram of the image. Let the histogram of an image be represented by h(l), l = 1, 2, ..., L, and the entropy of this image is

$H = - \sum_{i = 0}^{L} h (l) \log [h (l)] (6.12)$ $H = - \sum_{i = 0}^{L} h (l) \log [h (l)] (6.12)$

If the entropy of a fused image is larger than the entropy of the original image, it means the information quantity in the fused image is increased.

Cross Entropy: The cross entropy between a fused image and the original image directly indicates the relative difference between the information quantity of the two images. The symmetric form of the cross entropy is called symmetric cross entropy. If denoting hg(l) and hf(l), l = 1, 2,..., L, as the histograms of a fused image and the original image, respectively, the symmetric cross entropy between these two images is:

$K (f : g) = - \sum_{l = 0}^{L} h_{g} (l) \log [\frac{h_{g} (l)}{h_{f} (l)}] - \sum_{l = 0}^{L} h_{f} (l) \log [\frac{h_{f} (l)}{h_{g} (l)}] (6.13)$ $K (f : g) = - \sum_{l = 0}^{L} h_{g} (l) \log [\frac{h_{g} (l)}{h_{f} (l)}] - \sum_{l = 0}^{L} h_{f} (l) \log [\frac{h_{f} (l)}{h_{g} (l)}] (6.13)$

The smaller the cross entropy, the bigger the information quantity obtained by the fused image from the original image.

Interrelating Entropy: The interrelating entropy between the original image and a fused image is another metric representing the correlation between these images. The interrelating entropy between two images is

$C (f : g) = - \sum_{l_{2} = 0}^{L} \sum_{l_{1} = 0}^{L} P_{f g} (l_{1}, l_{2}) \log P_{f g} (l_{1}, l_{2}) (6.14)$ $C (f : g) = - \sum_{l_{2} = 0}^{L} \sum_{l_{1} = 0}^{L} P_{f g} (l_{1}, l_{2}) \log P_{f g} (l_{1}, l_{2}) (6.14)$

where Pfg(l1, l2) represents the interrelating probability between two pixels at the same position in the two images. The one in the original image has a gray-level value of l1, while the one in the fused image has a gray-level value of l2. In general, the larger the interrelating entropy between the original image and the fused image, the better the fused result.

Mutual Information: The mutual information between two images reflects their information relation. It can be calculated with the help of the probability means of the histograms. Using hf(l), hg(l), and Pfg(l1, l2) defined above, the mutual information between two images can be obtained by

$H (f, g) = \sum_{l_{2} = 0}^{L} \sum_{l_{1} = 0}^{L} P_{f g} (l_{1}, l_{2}) \log \frac{P_{f g} (l_{1}, l_{2})}{h_{f} (l_{1}) h_{g} (l_{2})} (6.15)$ $H (f, g) = \sum_{l_{2} = 0}^{L} \sum_{l_{1} = 0}^{L} P_{f g} (l_{1}, l_{2}) \log \frac{P_{f g} (l_{1}, l_{2})}{h_{f} (l_{1}) h_{g} (l_{2})} (6.15)$

If the fused image g(x, y) is obtained from images f1(x, y) and f2(x, y), then the mutual information of g(x, y) with f(x, y) and f2(x, y) is

$H (f_{1}, f_{2}, g) = H (f_{1}, g) + H (f_{2}, g) - H (f_{1}, f_{2}) (6.16)$ $H (f_{1}, f_{2}, g) = H (f_{1}, g) + H (f_{2}, g) - H (f_{1}, f_{2}) (6.16)$

Note that if the mutual information between f1 and g is correlated with the mutual information between f2 and g, the correlated part H(f1, f2) should be subtracted. Such mutual information indicates the information quantity included in the fused image, which stems from the original image. Equation (6.16) can be extended to fusion with more than two images.

6.2.3.4Evaluation According to Fusion Objectives

The selection of the evaluation criteria is often carried out according to the fusion objectives/goals. The following are some examples:

(1)If the purpose of a fusion is to remove the noise in an image, the criteria based on signal-to-noise ratio can be selected.

(2)If the purpose of a fusion is to increase the resolution of an image, the criteria based on statistical properties and spectrums can be selected.

(3)If the purpose of a fusion is to augment the information quantity in an image, the criteria based on information quantities can be selected.

(4)If the purpose of a fusion is to raise the definition of an image, the criteria based on the average gradient of gray levels can be selected.

Raising the definition of an image is useful to enrich detailed information and to improve visual effects. Using the criteria based on statistics for evaluating fusion results can help the evaluation from the point of subjective and local views. In general, the augmentation of information quantities in an image is helpful for extracting features and recognizing objects. Therefore, using the criteria based on information quantities for evaluating fusion results will reflect the fusion effect from the global and objective point of views.

Figure 6.4: A TM image and a SPOT image.

6.3Pixel-Layer Fusion

In pixel-layer fusion, the original images to be fused have some different but complementary properties. The methods for fusing these images should include these properties.

6.3.1Basic Fusion Methods

In the following, by taking the fusion of the TM (thematic map) multispectrum images captured by the Landsat earth resource satellite and that of whole spectrum images captured by the SPOT remote sensing satellite as an example, several basic and typical pixel-layer fusion methods are introduced (some experimental results are shown in the next subsection) (Bian, 2005) and (Li, 2005b). Currently, TM multispectrum images cover seven bands ranging from the blue to the infrared (with a wavelength of 0.45 ~ 12.5 µm), and SPOT whole spectrum images cover five bands ranging from the visible light to the near infrared (with a wavelength of 0.5 ~ 1.75 µm). The spatial resolution of a SPOT image is higher than that of a TM image, but the spectrum coverage of a TM image is wider than that of a SPOT image. Figure 6.4(a, b) show a TM image ft(x, y) in Band-5 (with a wavelength of 1.55 ~ 1.75 µm) and a SPOT image fs(x, y) with a wavelength of 0.5 ~ 0.73 µm, respectively.

6.3.1.1Weighted Average Fusion

The weighted average fusion is an intuitionistic method with the following steps:

(1)Select the region of interest in ft(x, y).

(2)Re-sample different wave-band images in this region to extend ft(x, y) to a highresolution image.

(3)Select the corresponding region of interest in fs(x, y) and match it with ft(x, y).

(4)Perform the following algebra operation to obtain a weighted average fused image,

$g (x, y) = w_{s} f_{s} (x, y) + w_{t} f_{t} (x, y) (6.17)$ $g (x, y) = w_{s} f_{s} (x, y) + w_{t} f_{t} (x, y) (6.17)$

where ws and wt are the weights for fs(x, y) and ft(x, y), respectively.

6.3.1.2Pyramid Fusion

A pyramid is a data structure (presented in Section 3.3.4 of Volume II of this book set), which can be used to represent images in multiple scales. Based on the pyramid structure and multiscale decomposition, pyramid fusion can be carried out by the following steps:

(1)Select the region of interest in ft(x, y).

(2)Re-sample different wave-band images in this region to extend ft(x, y) to a highresolution image.

(3)Decompose all ft(x, y) and fs(x, y) to be fused according to the pyramid structure.

(4)Fuse the corresponding decomposition results of ft(x, y) and fs(x, y) in every layer of the pyramid and obtain a fused pyramid.

(5)Reconstruct the fused image from the fused pyramid by using an inverse process for generating the pyramid.

6.3.1.3HSI Transform Fusion

The HSI (hue, saturation, and intensity) transform is a transform that converts a color image from the RGB color space to the HSI color space, as shown in eq. (8.32) to eq. (8.34) of Volume I in this book set. The HSI transform fusion performs the fusion operation with the help of the HSI transform, which has the following steps:

(1)Select three bands of images from ft(x, y), take them as R, G, and B images, and transform them into H, S, and I images.

(2)Substitute the I image after the HSI transform (this image determines the details) by fs(x, y).

(3)Perform the inverse HSI transform and then take the thus obtained RGB image as the fused image.

6.3.1.4PCA-Based Fusion

The basis of PCA (principal component analysis) is the KL transform. The main steps for PCA-based fusion include the following steps:

(1)Select three or more bands of images from ft(x, y) to perform PCA.

(2)Take the first principal component obtained by the PCA operation, match it with fs(x, y) by using their histograms, and make the component and fs(x, y) have comparable mean values and variance values.

(3)Substitute the first principal component obtained by PCA operation by the above matched fs(x, y), perform the inverse PCA, and then take the thus obtained image as the fused image.

6.3.1.5Wavelet Transform Fusion

Wavelet transform decomposes an image into low-frequency and high-frequency subimages corresponding to different structures of the image. The main steps for wavelet transform fusion are listed below:

(1)Perform the wavelet transform for both ft(x, y) and fs(x, y), and obtain low-frequency and high-frequency subimages for each image.

(2)Substitute the low-frequency subimages of fs(x, y) with those of ft(x, y).

(3)Combine the substituted low-frequency subimages of ft(x, y) with the high-frequency subimages of fs(x, y), and perform the inverse transform to obtain the fused image.

The fusion image thus obtained effectively keeps the low-frequency components of ft(x, y), which are rich in spectrum information, and includes the high-frequency components of fs(x, y), which are rich in detailed information, so the fused image has been improved both in visual appearance and statistics.

When using the wavelet-transform fusion method, the level of decomposition plays an important role and has a critical influence on the fusion result. This will be discussed more in the section below.

6.3.2Combination of Fusion Methods

The above-introduced fusion methods have their own limitations.

The weighted average fusion method is simple and fast, but it is ineffective for anti-jamming and the quality of the fused image is questionable. One typical problem is the blurring caused by averaging.

The pyramid fusion method is also simple to implement and can provide a clearly fused image. However, the different layers in the pyramid are correlated, which means the images in the different layers have redundancy. Besides, the reconstruction of the pyramid has some instability, especially for distinct images.

The HSI transform fusion method, when used to fuse the TM multi-spectrum image and the SPOT whole spectrum image, can make the fused image have high definition to enhance the spatial detail information in the image. However, if all I components of the TM multispectrum image are substituted by the SPOT whole spectrum image, the spectrum information will have a big loss and the resulting fusion image will be distorted.

The PCA-based fusion method can make the fused image have both the property of a high spatial definition and a high spectrum definition, and the details for objects will be even richer. However, if the first principal component of a TM image is substituted by a SPOT image, some useful information in the first principal component of the TM image, which is related to the spectrum property, will be lost. In this case, the spectrum definition of the fused image will be affected.

The wavelet transform decomposes an image into high-frequency detail parts and low-frequency approximation parts. As they correspond to different structures in the image, the structure information and the detail information could be easily extracted. The wavelet-transform fusion method can effectively keep the spectrum information from a multispectrum image and the detailed information from a whole spectrum image. Therefore, the fused image would be better both in visual appearance and in statistics. However, the standard wavelet transform has two problems. One is that the standard wavelet transform is equivalent to filtering an image with high-pass and low-pass filters, and this filtering process will cause the loss of some original information. The other is that the gray levels of TM and SPOT images are quite different, and the fusion would cause the change of TM image’s spectrum information and induce the generation of noise.

To overcome the problems of using only one type of fusion method, different fusion methods have been combined in practice. Some discussions with ft(x, y) and fs(x, y) are as follows.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6 Multisensor Image Fusion

Create new playlist

Sign In

Sign Up

Table of Contents for
6 Multisensor Image Fusion