Principles and Practice of Image and Spatial Data Fusion*


Ed Waltz and Tim Waltz


5.1   Introduction

5.2   Motivations for Combining Image and Spatial Data

5.3   Defining Image and Spatial Data Fusion

5.4   Three Classic Levels of Combination for Multisensor Automatic Target Recognition Data Fusion

5.4.1   Pixel-Level Fusion

5.4.2   Feature-Level Fusion   Discrete Model Matching Approach   Adaptive Model Matching Approach

5.4.3   Decision-Level Fusion

5.4.4   Multiple-Level Fusion

5.5   Image Data Fusion for Enhancement of Imagery Data

5.5.1   Multiresolution Imagery

5.5.2   Dynamic Imagery

5.5.3   Three-Dimensional Imagery

5.6   Spatial Data Fusion Applications

5.6.1   Spatial Data Fusion: Combining Image and Nonimage Data to Create Spatial Information Systems

5.6.2   Mapping, Charting, and Geodesy Applications   Representative Military Example   Representative Crime Mapping Examples

5.7   Spatial Data Fusion in GEOINT

5.8   Summary




The joint use of imagery and spatial data from different imaging, mapping, or other spatial sensors has the potential to provide significant performance improvements over single-sensor detection, classification, and situation assessment functions. The terms imagery fusion and spatial data fusion have been applied to describe a variety of combining operations for a wide range of image enhancement and understanding applications. Surveillance, robotic machine vision, and automatic target cueing (ATC) are among the application areas that have explored the potential benefits of multiple-sensor imagery. This chapter provides a framework for defining and describing the functions of image data fusion in the context of the Joint Directors of Laboratories (JDL) data fusion model. The chapter also describes representative methods and applications.

Sensor fusion and data fusion have become the de facto terms to describe the general abductive or deductive combination processes by which diverse sets of related data are joined or merged to produce a product that is greater than the individual parts. A range of mathematical operators have been applied to perform this process for a wide range of applications. Two areas that have received increasing research attention over the past decade are the processing of imagery (two-dimensional [2D] information) and spatial data (three-dimensional [3D] representations of real-world surfaces and objects that are imaged). These processes combine multiple data views into a composite set that incorporates the best attributes of all contributors. The most common product is a spatial (3D) model, or virtual world, which represents the best estimate of the real world as derived from all sensors.



5.2   Motivations for Combining Image and Spatial Data

A diverse range of applications have employed image data fusion to improve imaging and automatic detection or classification performance over that of single imaging sensors. Table 5.1 summarizes representative and recent research and development in six key application areas.

Satellite and airborne imagery used for military intelligence, photogrammetric, earth resources, and environmental assessments can be enhanced by combining registered data from different sensors to refine the spatial or spectral resolution of a composite image product. Registered imagery from different passes (multitemporal) and different sensors (multispectral and multiresolution) can be combined to produce composite imagery with spectral and spatial characteristics equal to, or better than, that of the individual contributors.

Composite SPOT and LANDSAT satellite imagery and 3D terrain relief composites of military regions demonstrate current military applications of such data for mission planning purposes.1, 2 and 3 The Joint National Intelligence Development Staff (JNIDS) pioneered the development of workstation-based systems to combine a variety of image and nonimage sources for intelligence analysts4 who perform:

  • Registration. Spatial alignment of overlapping images and maps to a common coordinate system

  • Mosaicking. Registration of nonoverlapping, adjacent image sections to create a composite of a larger area

  • 3D mensuration-estimation. Calibrated measurement of the spatial dimensions of objects within in-image data

Representative Range of Activities Applying Spatial and Imagery Fusion


Similar image functions have been incorporated into a variety of image processing systems, from tactical image systems such as the premier Joint Service Image Processing System (JSIPS) to UNIX- and PC-based commercial image processing systems. Military services and the National Geospatial Intelligence Agency (NGA) are performing cross intelligence (i.e., IMINT and other intelligence sources) data fusion research to link signals and human reports to spatial data.5

When the fusion process extends beyond imagery to include other spatial data sets, such as digital terrain data, demographic data, and complete geographic information system (GIS) data layers, numerous mapping applications may benefit. Military intelligence preparation of the battlefield (IPB) functions (e.g., area delimitation and transportation network identification), as well as wide area terrain database generation (e.g., precision GIS mapping), are complex mapping problems that require fusion to automate processes that are largely manual. One area of ambitious research in spatial data fusion is the U.S. Army Topographic Engineering Center’s (TEC) efforts to develop automatic terrain feature generation techniques based on a wide range of source data, including imagery, map data, and remotely sensed terrain data.6 On the broadest scale, NGA’s Global Geospatial Information and Services (GGIS) vision includes spatial data fusion as a core functional element.7 NGA’s Mapping, Charting, and Geodesy Utility Software package (MUSE), for example, combines vector and raster data to display base maps with overlays of a variety of data to support geographic analysis and mission planning.

Real-time ATC/ATR (ATR—automatic target recognition) for military applications has turned to multiple-sensor solutions to expand spectral diversity and target feature dimensionality, seeking to achieve high probabilities of correct detection or identification at acceptable false alarm rates. Forward-looking infrared (FLIR), imaging millimeter wave (MMW), and light amplification for detection and ranging (LADAR) sensors are the most promising suite capable of providing the diversity needed for reliable discrimination in battlefield applications. In addition, some applications seek to combine the real-time imagery to present an enhanced image to the human operator for driving, control, and warning, as well as manual target recognition.

Industrial robotic applications for fusion include the use of 3D imaging and tactile sensors to provide sufficient image understanding to permit robotic manipulation of objects. These applications emphasize automatic object position understanding rather than recognition (e.g., the target recognition) that is, by nature, noncooperative.8

Transportation applications combine MMW and electrooptical imaging sensors to provide collision avoidance warning by sensing vehicles whose relative rates and locations pose a collision threat.

Medical applications fuse information from a variety of imaging sensors to provide a complete 3D model or enhanced 2D image of the human body for diagnostic purposes. The United Medical and Dental Schools of Guy’s and St. Thomas’ Hospital (London, U.K.) have demonstrated methods for registering and combining magnetic resonance (MR), positron emission tomography (PET), and computer tomography (CT) into composites to aid surgery.9



5.3   Defining Image and Spatial Data Fusion

In this chapter, image and spatial data fusion are distinguished as subsets of the more general data fusion problem that is typically aimed at associating and combining 3D data about sparse point objects located in space. Targets on a battlefield, aircraft in airspace, ships on the ocean surface, or submarines in the 3D ocean volume are common examples of targets represented as point objects in a 3D space model.

Image data fusion, however, is involved with associating and combining complete, spatially filled sets of data in 2D (images) or 3D (terrain or high-resolution spatial representations of real objects). Herein lies the distinction: image and spatial data fusion requires data representing every point on a surface or in space to be fused, rather than selected points of interest.

The more general problem is described in detail in the introductory texts by Waltz and Llinas10 and Hall,11 while the progress in image and spatial data fusion is reported over a wide range of the technical literature, as cited in this chapter.

The taxonomy in Figure 5.1 distinguishes the data properties and objectives that distinguish four categories of fusion applications.

In all the image and spatial applications cited in the text above, the common thread of the fusion function is its emphasis on the following distinguishing functions:

  • Registration involves spatial and temporal alignment of physical items within imagery or spatial data sets and is a prerequisite for further operations. It can occur at the raw image level (i.e., any pixel in one image may be referenced with known accuracy to a pixel or pixels in another image, or to a coordinate in a map) or at higher levels, relating objects rather than individual pixels. Of importance to every approach to combining spatial data is the accuracy with which the data layers have been spatially aligned relative to each other or to a common coordinate system (e.g., geolocation or geocoding of earth imagery to an earth projection). Registration can be performed by traditional internal image-to-image correlation techniques (when the images are from sensors with similar phenomena and are highly correlated)12 or by external techniques.13 External methods apply in-image control knowledge or as-sensed information that permits accurate modeling and estimation of the true location of each pixel in 2D or 3D space.


    FIGURE 5.1
    Data fusion application taxonomy.

  • The combination function operates on multiple, registered layers of data to derive composite products using mathematical operators to perform integration; mosaicking; spatial or spectral refinement; spatial, spectral, or temporal (change) detection; or classification.

  • Reasoning is the process by which intelligent, often iterative search operations are performed between the layers of data to assess the meaning of the entire scene at the highest level of abstraction and of individual items, events, and data contained in the layers.

The image and spatial data fusion functions can be placed in the JDL data fusion model context to describe the architecture of a system that employs imagery data from multiple sensors and spatial data (e.g., maps and solid models) to perform detection, classification, and assessment of the meaning of information contained in the scenery of interest.

Figure 5.2 compares the JDL general model14 with a specific multisensor ATR image data fusion functional flow to show how the more abstract model can be related to a specific imagery fusion application. The level 1 processing steps can be directly related to image counterparts as follows:


Image of a data fusion functional flow can be directly compared to the Joint Directors of Laboratories data fusion subpanel model of data fusion.

  • Alignment. The alignment of data into a common time, space, and spectral reference frame involves spatial transformations to warp image data to a common coordinate system (e.g., projection to an earth reference model or 3D space). At this point, nonimaging data that can be spatially referenced (perhaps not to a point, but often to a region with a specified uncertainty) can then be associated with the image data.

  • Association. New data can be correlated with the previous data to detect and segment (select) targets on the basis of motion (temporal change) or behavior (spatial change). In time-sequenced data sets, target objects at time t are associated with target objects at time t – 1 to discriminate newly appearing targets, moved targets, and disappearing targets.

  • Tracking. When objects are tracked in dynamic imagery, the dynamics of target motion are modeled and used to predict the future location of targets (at time t + 1) for comparison with new sensor observations.

  • Identification. The data for segmented targets are combined from multiple sensors (at any one of several levels) to provide an assignment of the target to one or more of several target classes.

Level 2 and 3 processing deals with the aggregate of targets in the scene and other characteristics of the scene to derive an assessment of the meaning of data in the scene or spatial data set.

In the following sections, the primary image and spatial data fusion application areas are described to demonstrate the basic principles of fusion and the state of the practice in each area (Figure 5.3).


Three basic levels of fusion are provided to the multisensor automatic target recognition designer as the most logical alternative points in the data chain for combining data.



5.4   Three Classic Levels of Combination for Multisensor Automatic Target Recognition Data Fusion

Since the late 1970s, the ATR literature has adopted three levels of image data fusion as the basic design alternatives offered to the system designer. The terminology was adopted to describe the point in the traditional ATR processing chain at which registration and combination of different sensor data occurred. These functions can occur at multiple levels, as described later in this chapter. First, a brief overview of the basic alternatives and representative research and development results is presented. (Broad overviews of the developments in ATR in general, with specific comments on data fusion, are available in other literature.15, 16 and 17)

5.4.1   Pixel-Level Fusion

At the lowest level, pixel-level fusion uses the registered pixel data from all image sets to perform detection and discrimination functions. This level has the potential to achieve the greatest signal detection performance (if registration errors can be contained) at the highest computational expense. At this level, detection decisions (pertaining to the presence or absence of a target object) are based on the information from all sensors by evaluating the spatial and spectral data from all layers of the registered image data. A subset of this level of fusion is segment-level fusion, in which basic detection decisions are made independently in each sensor domain, but the segmentation of image regions is performed by evaluation of the registered data layers.

Fusion at the pixel level involves accurate registration of the different sensor images before applying a combination operator to each set of registered pixels (which correspond to associated measurements in each sensor domain at the highest spatial resolution of the sensors). Spatial registration accuracies should be subpixel to avoid combination of unrelated data, making this approach the most sensitive to registration errors. Because image data may not be sampled at the same spacing, resampling and warping of images are generally required to achieve the necessary level of registration before combining pixel data.

In the most direct 2D image applications of this approach, coregistered pixel data may be classified on a pixel-by-pixel basis using approaches that have long been applied to multispectral data classification.18 Typical ATR applications, however, pose a more complex problem when dissimilar sensors, such as FLIR and LADAR, image in different planes. In such cases, the sensor data must be projected into a common 2D or 3D space for combination. Gonzalez and Williams, for example, have described a process for using 3D LADAR data to infer FLIR pixel locations in three dimensions to estimate target pose before feature extraction (FE).19 Schwickerath and Beveridge present a thorough analysis of this problem, developing an eight-degree-of-freedom model to estimate both the target pose and relative sensor registration (coregistration) based on a 2D and 3D sensor.20

Delanoy et al. demonstrated pixel-level combination of “spatial interest images” using Boolean and fuzzy logic operators.21 This process applies a spatial feature extractor to develop multiple interest images (representing the relative presence of spatial features in each pixel), before combining the interest images into a single detection image. Similarly, Hamilton and Kipp describe a probe-based technique that uses spatial templates to transform the direct image into probed images that enhance target features for comparison with reference templates.22,23 Using a limited set of television and FLIR imagery, Duane compared pixel-level and feature-level fusion to quantify the relative improvement attributable to the pixel-level approach with well-registered imagery sets.24

5.4.2   Feature-Level Fusion

At the intermediate level, feature-level fusion combines the features of objects that are detected and segmented in the individual sensor domains. This level presumes independent detectability of objects in all the sensor domains. The features for each object are independently extracted in each domain; these features create a common feature space for object classification.

Such feature-level fusion reduces the demand on registration, allowing each sensor channel to segment the target region and extract features without regard to the other sensor’s choice of target boundary. The features are merged into a common decision space only after a spatial association is made to determine that the features were extracted from objects whose centroids were spatially associated.

During the early 1990s, the Army evaluated a wide range of feature-level fusion algorithms for combining FLIR, MMW, and LADAR data for detecting battlefield targets under the multisensor feature-level fusion (MSFLF) program of the OSD multisensor-aided targeting initiative. Early results demonstrated marginal gains over single-sensor performance and reinforced the importance of careful selection of complementary features to specifically reduce single-sensor ambiguities.25

At the feature level of fusion, researchers have developed model-based (or model-driven) alternatives to the traditional statistical methods, which are inherently data driven. Model-based approaches maintain target and sensing models that predict all possible views (and target configurations) for comparison with extracted features rather than using a more limited set of real signature data for comparison.26 The application of model-based approaches to multiple-sensor ATR offers several alternative implementations, two of which are described in Figure 5.4. The adaptive model matching approach performs FE and comparison (match) with predicted features for the estimated target pose. The process iteratively searches to find the best model match for the extracted features.


Two model-based sensor alternatives demonstrate the use of a prestored hierarchy of model-based templates or an online iterative model that predicts features based on estimated target pose.   Discrete Model Matching Approach

A multisensor model-based matching approach described by Hamilton and Kipp27 develops a relational tree structure (hierarchy) of 2D silhouette templates. These templates capture the spatial structure of the most basic all-aspect target “blob” (at the top or root node), down to individual target hypotheses at specific poses and configurations. This predefined search tree is developed on the basis of model data for each sensor, and the ATR process compares segmented data to the tree, computing a composite score at each node to determine the path to the most likely hypotheses. At each node, the evidence is accumulated by applying an operator (e.g., weighted sum, Bayesian combination, etc.) to combine the score for each sensor domain.   Adaptive Model Matching Approach

Rather than using prestored templates, this approach implements the sensor or the target modeling capability within the ATR algorithm to dynamically predict features for direct comparison. Figure 5.4 illustrates a two-sensor extension of the one-sensor, model-based ATR paradigm (e.g., automatic radar air-to-ground target acquisition program (ARAGTAP)28 or moving and stationary target acquisition and recognition (MSTAR)29 approaches) in which independent sensor features are predicted and compared iteratively, and evidence from the sensors is accumulated to derive a composite score for each target hypothesis.

Most Common Decision-Level Combination Alternatives


Larson et al. describe a model-based IR/LADAR fusion algorithm that performs extensive pixel-level registration and FE before performing the model-based classification at the extracted feature level.30 Similarly, Corbett et al. describe a model-based feature-level classifier that uses IR and MMW models to predict features for military vehicles.31 Both of these follow the adaptive generation approach.

5.4.3   Decision-Level Fusion

Fusion at the decision level (also called postdecision or postdetection fusion) combines the decisions of independent sensor detection or classification paths by Boolean (AND, OR) operators or by a heuristic score (e.g., M-of-N, maximum vote, or weighted sum). Two methods of making classification decisions exist: hard decisions (single, optimum choice) and soft decisions, in which decision uncertainty in each sensor chain is maintained and combined with a composite measure of uncertainty (Table 5.2).

The relative performance of alternative combination rules and independent sensor thresholds can be optimally selected using distribution data for the features used by each sensor.32 In decision-level fusion, each path must independently detect the presence of a candidate target and perform a classification on the candidate. These detections and classifications (the sensor decisions) are combined into a fused decision. This approach inherently assumes that the signals and signatures in each independent sensor chain are sufficient to perform independent detection before the sensor decisions are combined. This approach is much less sensitive to spatial misregistration than all others and permits accurate association of detected targets to occur with registration errors over an order of magnitude larger than for pixel-level fusion. Lee and Vleet have shown procedures for estimating the registration error between sensors to minimize the mean square registration error and optimize the association of objects in dissimilar images for decision-level fusion.33

Decision-level fusion of MMW and IR sensors has long been considered a prime candidate for achieving the level of detection performance required for autonomous precision-guided munitions.34 Results of an independent two-sensor (MMW and IR) analysis on military targets demonstrated the relative improvement of two-sensor decision-level fusion over either independent sensor.35, 36 and 37 A summary of ATR comparison methods was compiled by Diehl et al.38 These studies demonstrated the critical sensitivity of performance gains to the relative performance of each contributing sensor and the independence of the sensed phenomena.

5.4.4   Multiple-Level Fusion

In addition to the three classic levels of fusion, other alternatives or combinations have been advanced. At a level even higher than the decision level, some researchers have defined scene-level methods in which target detections from a low-resolution sensor are used to cue a search-and-confirm action by a higher-resolution sensor. Menon and Kolodzy described such a system that uses FLIR detections to cue the analysis of high-spatial resolution laser radar data using a nearest-neighbor neural network classifier.39 Maren describes a scene structure method that combines information from hierarchical structures developed independently by each sensor by decomposing the scene into element representations.40 Others have developed hybrid, multilevel techniques that partition the detection problem to a high level (e.g., decision level) and the classification to a lower level. Aboutalib et al. described a hybrid algorithm that performs decision-level combination for detection (with detection threshold feedback) and feature-level classification for air target identification in IR and TV imagery.41

Other researchers have proposed multilevel ATR architectures, which perform fusion at all levels, carrying out an appropriate degree of combination at each level based on the ability of the combined information to contribute to an overall fusion objective. Chu and Aggarwal describe such a system that integrates pixel-level to scene-level algorithms.42 Eggleston has long promoted such a knowledge-based ATR approach that combines data at three levels, using many partially redundant combination stages to reduce the errors of any single unreliable rule.43,44 The three levels in this approach are:

  1. Low level. Pixel-level combinations are performed when image enhancement can aid higher-level combinations. The higher levels adaptively control this fine grain combination.

  2. Intermediate symbolic level. Symbolic representations (tokens) of attributes or features for segmented regions (image events) are combined using a symbolic level of description.

  3. High level. The scene or context level of information is evaluated to determine the meaning of the overall scene, by considering all intermediate-level representations to derive a situation assessment. For example, this level may determine that a scene contains a brigade-sized military unit forming for attack. The derived situation can be used to adapt lower levels of processing to refine the high-level hypotheses.

Bowman and DeYoung described an architecture that uses neural networks at all levels of the conventional ATR processing chain to achieve pixel-level performances of up to .99 probability of correct identification for battlefield targets using pixel-level neural network fusion of UV, visible, and MMW imagery.45

Pixel-, feature-, and decision-level fusion designs have focused on combining imagery for the purposes of detecting and classifying specific targets. The emphasis is on limiting processing by combining only the most likely regions of target data content at the minimum necessary level to achieve the desired detection or classification performance. This differs significantly from the next category of image fusion designs, in which all data must be combined to form a new spatial data product that contains the best composite properties of all contributing sources of information.



5.5   Image Data Fusion for Enhancement of Imagery Data

Both still and moving image data can be combined from multiple sources to enhance desired features, combine multiresolution or differing sensor look geometries, mosaic multiple views, and reduce uncorrelated noise.

5.5.1   Multiresolution Imagery

One area of enhancement has been in the application of “band sharpening” or “multiresolution image fusion” algorithms to combine differing resolution satellite imagery. The result is a composite product that enhances the spatial boundaries in lower-resolution multispectral data using higher-resolution panchromatic or synthetic aperture radar (SAR) data.

Veridian-ERIM International has applied its Sparkle algorithm to the band sharpening problem, demonstrating the enhancement of lower-resolution SPOT multispectral imagery (20 m ground sample distance (GSD)) with higher-resolution airborne SAR (3 m GSD) and panchromatic photography (1 m) to sharpen the multispectral data. Radar backscatter features are overlayed on the composite to reveal important characteristics of the ground features and materials. The composite image preserves the spatial resolution of the panchromatic data, the spectral content of the multispectral layers, and the radar reflectivity of the SAR.

Vrabel has reported the relative performance of a variety of band sharpening algorithms, concluding that Veridian ERIM International’s Sparkle algorithm and a color normalization (CN) technique provided the greatest GSD enhancement and overall utility.46 Additional comparisons and applications of band sharpening techniques have been published in the literature.47, 48, 49 and 50

Imagery can also be mosaicked by combining overlapping images into a common block, using classical photogrammetric techniques (bundle adjustment) that use absolute ground control points and tie points (common points in overlapped regions) to derive mapping polynomials. The data may then be forward resampled from the input images to the output projection or backward resampled by projecting the location of each output pixel onto each source image to extract pixels for resampling.51 The latter approach permits spatial deconvolution functions to be applied in the resampling process. Radiometric feathering of the data in transition regions may also be necessary to provide a gradual transition after overall balancing of the radiometric dynamic range of the mosaicked image is performed.52 Such mosaicking fusion processes have also been applied to 3D data to create composite digital elevation models (DEMs) of terrain.53

5.5.2   Dynamic Imagery

In some applications, the goal is to combine different types of real-time video imagery to provide the clearest possible composite video image for a human operator. The David Sarnoff Research Center has applied wavelet encoding methods to selectively combine IR and visible video data into a composite video image that preserves the most desired characteristics (e.g., edges, lines, and boundaries) from each data set.54 The Center later extended the technique to combine multitemporal and moving images into composite mosaic scenes that preserve the “best” data to create a current scene at the best possible resolution at any point in the scene.55,56

5.5.3   Three-Dimensional Imagery

3D perspectives of the earth’s surface are a special class of image data fusion products that have been developed by draping orthorectified images of the earth’s surface over digital terrain models. The 3D model can be viewed from arbitrary static perspectives, or a dynamic fly-through, which provides a visualization of the area for mission planners, pilots, or land planners.

Off-nadir regions of aerial or spaceborne imagery include a horizontal displacement error that is a function of the elevation of the terrain. A DEM is used to correct for these displacements to accurately overlay each image pixel on the corresponding post (i.e., terrain grid coordinate). Photogrammetric orthorectification functions57 include the following steps to combine the data:

  • DEM preparation. The DEM is transformed to the desired map projection for the final composite product.

  • Transform derivation. Platform, sensor, and the DEM are used to derive mapping polynomials that will remove the horizontal displacements caused by terrain relief, placing each input image pixel at the proper location on the DEM grid.

  • Resampling. The input imagery is resampled into the desired output map grid.

  • Output file creation. The resampled image data (x, y, and pixel values) and DEM (x, y, and z) are merged into a file with other georeferenced data, if available.

  • Output product creation. 2D image maps may be created with map grid lines, or 3D visualization perspectives can be created for viewing the terrain data from arbitrary viewing angles.

The basic functions necessary to perform registration and combination are provided in an increasing number of commercial image processing software packages (Table 5.3), permitting users to fuse static image data for a variety of applications.

Basic Image Data Fusion Functions Provided in Several Commercial Image Processing Software Packages




5.6   Spatial Data Fusion Applications

Robotic and transportation applications include a wide range of applications similar to military applications. Robotics applications include relatively short-range, high-resolution imaging of cooperative target objects (e.g., an assembly component to be picked up and accurately placed) with the primary objectives of position determination and inspection. Transportation applications include longer-range sensing of vehicles for highway control and multiple-sensor situation awareness within a vehicle to provide semiautonomous navigation, collision avoidance, and control.

The results of research in these areas are chronicled in a variety of sources, beginning with the 1987 Workshop on Spatial Reasoning and MultiSensor Fusion,58 and many subsequent SPIE conferences.59, 60, 61, 62 and 63

5.6.1   Spatial Data Fusion: Combining Image and Nonimage Data to Create Spatial Information Systems

One of the most sophisticated image fusion applications combines diverse sets of imagery (2D), spatially referenced nonimage data sets, and 3D spatial data sets into a composite spatial data information system. The most active area of research and development in this category of fusion problems is the development of GIS by combining earth imagery, maps, demographic and infrastructure, or facilities mapping (geospatial) data into a common spatially referenced database.

Applications for such capabilities exist in three areas. In civil government, the need for land and resource management has prompted intense interest in establishing GISs at all levels of government. The U.S. Federal Geographic Data Committee is tasked with the development of a National Spatial Data Infrastructure (NSDI), which establishes standards for organizing the vast amount of geospatial data currently available at the national level and coordinating the integration of future data.64

Commercial applications for geospatial data include land management, resources exploration, civil engineering, transportation network management, and automated mapping or facilities management for utilities.

The military application of such spatial databases is the IPB,65 which consists of developing a spatial database containing all terrain, transportation, groundcover, man-made structures, and other features available for use in real-time situation assessment for command and control. The Defense Advanced Research Projects Agency (DARPA) Terrain Feature Generator is one example of a major spatial database and fusion function defined to automate the functions of IPB and geospatial database creation from diverse sensor sources and maps.66

Realization of efficient, affordable systems capable of accommodating the volume of spatial data required for large regions and performing reasoning that produces accurate and insightful information depends on two critical technology areas:

  1. Spatial data structure. Efficient, linked data structures are required to handle the wide variety of vector, raster, and nonspatial data sources. Hundreds of point, lineal, and areal features must be accommodated. Data volumes are measured in terabytes and short access times are demanded for even broad searches.

  2. Spatial reasoning. The ability to reason in the context of dynamically changing spatial data is required to assess the meaning of the data. The reasoning process must perform the following kinds of operations to make assessments about the data:

    1. Spatial measurements (e.g., geometric, topological, proximity, and statistics)

    2. Spatial modeling

    3. Spatial combination and inference operations, in uncertainty

    4. Spatial aggregation of related entities

    5. Multivariate spatial queries

Antony surveyed the alternatives for representing spatial and spatially referenced semantic knowledge67 and published the first comprehensive data fusion text68 that specifically focused on spatial reasoning for combining spatial data.

5.6.2   Mapping, Charting, and Geodesy Applications

The use of remotely sensed image data to create image maps and generate GIS base maps has long been recognized as a means of automating map generation and updating to achieve currency as well as accuracy.69, 70 and 71 The following features characterize integrated geospatial systems:

  • Currency. Remote sensing inputs enable continuous update with change detection and monitoring of the information in the database.

  • Integration. Spatial data in a variety of formats (e.g., raster and vector data) is integrated with metadata and other spatially referenced data, such as text, numerical, tabular, and hypertext formats. Multiresolution and multiscale spatial data coexist, are linked, and share a common reference (i.e., map projection).

  • Access. The database permits spatial query access for multiple user disciplines. All data is traceable, and the data accuracy, uncertainty, and entry time are annotated.

  • Display. Spatial visualization and query tools provide maximum human insight into the data content using display overlays and 3D capability.

Ambitious examples of such geospatial systems include the DARPA Terrain Feature Generator, the European ESPRIT II MultiSource Image Processing System (MuSIP),72,73 and NASA’s Earth Observing Systems Data and Information System (EOSDIS).74

Figure 5.5 illustrates the most basic functional flow of such a system, partitioning the data integration (i.e., database generation) function from the scene assessment function. The integration function spatially registers and links all data to a common spatial reference and also combines some data sets by mosaicking, creating composite layers, and extracting features to create feature layers. During the integration step, higher-level spatial reasoning is required to resolve conflicting data and to create derivative layers from extracted features. The output of this step is a registered, refined, and traceable spatial database.

The next step is scene assessment, which can be performed for a variety of application functions (e.g., further FE, target detection, quantitative assessment, or creation of vector layers) by a variety of user disciplines. This stage extracts information in the context of the scene, and is generally query driven.

Table 5.4 summarizes the major kinds of registration, combination, and reasoning functions that are performed, illustrating the increasing levels of complexity in each level of spatial processing. Faust described the general principles for building such a geospatial database, the hierarchy of functions, and the concept for a blackboard architecture expert system to implement the functions described earlier.75


The spatial data fusion process flow includes the generation of a spatial database and the assessment of spatial information in the database by multiple users.

Spatial Data Fusion Functions



Target search example uses multiple layers of spatial data and applies iterative spatial reasoning to evaluate alternative hypotheses while accumulating evidence for each candidate target.   Representative Military Example

The spatial reasoning process can be illustrated by a hypothetical military example that follows the process an image or intelligence analyst might follow in search of critical mobile targets (CMTs). Consider the layers of a spatial database illustrated in Figure 5.6, in which recent unmanned air vehicle (UAV) SAR data (the top data layer) has been registered to all other layers, and the following process is performed (process steps correspond to path numbers on the figure):

  1. A target cueing algorithm searches the SAR imagery for candidate CMT targets, identifying potential targets in areas within the allowable area of a predefined delimitation mask (data layer 2).*

  2. Location of a candidate target is used to determine the distance to transportation networks (which are located in the map, data layer 3) and to hypothesize feasible paths from the network to the hide site.

  3. The terrain model (data layer 8) is inspected along all paths to determine the feasibility that the CMT could traverse the path. Infeasible path hypotheses are pruned.

  4. Remaining feasible paths (on the basis of slope) are then inspected using the multispectral data (data layers 4, 5, 6, and 7). A multispectral classification algorithm is scanned over the feasible paths to assess ground load-bearing strength, vegetation cover, and other factors. Evidence is accumulated for slope and these factors (for each feasible path) to determine a composite path likelihood. Evidence is combined into a likelihood value and unlikely paths are pruned.

  5. Remaining paths are inspected in the recent SAR data (data layer 1) for other significant evidence (e.g., support vehicles along the path, recent clear cut) that can support the hypothesis. Supportive evidence is accumulated to increase likelihood values.

  6. Composite evidence (target likelihood plus likelihood of feasible paths to candidate target hide location) is then used to make a final target detection decision.

In the example presented in Figure 5.6, the reasoning process followed a spatial search to accumulate (or discount) evidence about a candidate target. In addition to target detection, similar processes can be used to:

  • Insert data in the database (e.g., resolve conflicts between input sources)

  • Refine accuracy using data from multiple sources, etc.

  • Monitor subtle changes between existing data and new measurements

  • Evaluate hypotheses about future actions (e.g., trafficability of paths, likelihood of flooding given rainfall conditions, and economy of construction alternatives)   Representative Crime Mapping Examples

The widespread availability of desktop GIS systems has allowed local crime analysis units to develop new ways of mapping geocoded crime data and visualizing spatial-temporal patterns of criminal activity to better understand patterns of activity and underlying causal factors.76 This crime mapping process requires only that law enforcement systems geocode the following categories of source information in databases for subsequent analysis:

  • Calls for service. Date/time group, call number, call category, associated incident report identifier, address, and latitude and longitude of location a police unit was sent

  • Reported crimes. Date/time group for crime event, case number, crime category, address, and latitude and longitude of the crime location

  • Arrests. Date/time group of arrest, case number, arrested person information, charge category, address, and latitude and longitude of the arrest location

  • Routes. Known routes of travel (to-from crime scene, to-from drug deliveries, route between car-stolen and car-recovered, etc.) derived from arrest, investigation, and interrogation records

The spatial data fusion process is relatively simple, registering the geocoded event data to GIS layers for visualization and statistical analysis. Analysts can visualize and study the spatial distributions of this data (crimes, as well as addresses of victims, suspects, travel routes to-from crime scenes, and other related evidence) to

  • Discover patterns of spatial association, including spatial clusters or hot spots—areas that have a greater than average number of criminal or disorder events, or areas where people have a higher than average risk of victimization.77 The attributes of the hot spots (e.g., spatial properties for each block such as number of buildings, number of families, per capita income, distance to police station, etc.) may be used to predict other vulnerable areas for similar criminal activities.

  • Discover spatial-temporal patterns that can be correlated to behavior tempos or profiles (e.g., a drug addict’s tempo of burglaries, purchases, and rest).

  • Identify trends and detect changes in behavioral patterns to investigate root cause factors (e.g., changes in demographics, movement of people, economics, etc.).

  • Identify unusual event locations or spatial outliers.

Crime mapping is particularly useful in the analysis of a crime series—criminal offenses that are thought to share the same causal factor (usually a single offender or group of offenders) given their descriptive, behavior, spatial, or temporal commonality.78 Crime series are generally related to the category of “crimes of design” that are conducted within certain constraints and therefore may have an observable pattern, rather than “crimes of opportunity” that occur when the criminal, victim and circumstances converge at a particular time and place.

The crime data can be presented in a variety of thematic map formats (Table 5.5) for visual analysis, and the analyst can compute statistical properties of selected events, such as the spatial distribution of events, including measures like the mean center, center of minimum distance, or ellipses of standard deviation. Statistics can be computed for the properties of distances between events including nearest-neighbor measures or Ripley’s K statistic distance measure. Clustering methods are applied to spatial and temporal data to locate highly correlated cluster of similar events (hot spots) and to identify locations with spatial attributes that make them vulnerable to similar crime patterns.79 (This predictive analysis may be used to increase surveillance or set up decoys where criminals are expected to sustain a pattern.)

Typical Thematic Crime Map Formats

Map Type


Law Enforcement Use

Dot or pin map

Individual event locations are plotted as dots; symbol coding identifies the event (crime) type at each location

Identify general crime activity patterns, trends, vulnerable locations

For a specific crime series, identify candidate spatial patterns

Statistical map

Proportional symbols (e.g., pin sizes, pie charts, or histograms) are used to display quantitative data at locations or in areas

Identify crime densities and relative rates of events by area to manage police coverage

Choropleth map

Display discrete distributions of data within defined boundaries (police beats, precincts, districts, or census blocks)

Define hot spot areas that share the same level of risk

Isoline map

Display contour lines that bound areas of common attributes (e.g., crime rate) and show gradients between bounded areas

Inform law enforcement personnel about high incident areas to increase field contacts and surveillance in an area



5.7   Spatial Data Fusion in GEOINT

The term geospatial intelligence (GEOINT) refers to the exploitation and analysis of imagery and geospatial information to describe, assess, and visually depict physical features and geographically referenced activities on the Earth. GEOINT consists of three elements: imagery, imagery-derived intelligence, and geospatial information.80 The previously mentioned processes of spatial data fusion are at the core of GEOINT processing and analysis functions, enabling the registration, correlation, and spatial reasoning over many sources of intelligence data.

The three elements of GEOINT bring together different types of data that when integrated give a complete spatially coherent product capable of being used in a more detailed analysis. The first element, imagery, refers to any product that depicts features, objects, or activity, natural or man-made with the positional data from the same time. Imagery data can be collected by a large variety of platforms including satellite, airborne, and unmanned platforms; it is a crucial element that provides the initial capability for analysis. The second component of GEOINT is imagery intelligence, a derived result of the interpretation and analysis of imagery, adding context to imagery products. The third element is geospatial information that identifies locations and characteristics of features of the earth. This data includes two categories:

  1. Static information that is collected through remote sensing, mapping, and surveying; it is often derived from existing maps, charts, or related products.

  2. Dynamic information that is provided by objects being tracked by radar, a variety of tagging mechanisms, or self-reporting techniques such as blue force tracking systems that report the location of personnel and vehicles.

The capability to visualize registered geospatial in three dimensions increases situational awareness and adds to the context of GEOINT analysis. This capability allows reconstruction of scenes and dynamic activities using advanced modeling and simulation techniques. These techniques allow the creation of 3D fly-through products that can then be enhanced with the addition of information gathered from other intelligence disciplines; the fourth dimension of time and movement, provided by dynamic information, can be added to create dynamic and interactive products. These GEOINT products apply several of the fusion methods described in earlier sections to provide analysts with an accurate simulation of a site or an activity for analysis, mission training, or targeting.

To provide accurate, timely, and applicable intelligence, NGA has adopted a systematic process to apply spatial data fusion to analyze intelligence problems and produce standard GEOINT products. The four-step geospatial preparation of the environment (GPE) process (Figure 5.7) was adapted from the military’s IPB process to meet a broader spectrum of analysis including civilian and nontraditional threat issues.81 The process is described as a cycle, though in practice the steps need not be performed sequentially in a linear process; the cycle simply provides the GEOINT analyst a template to follow when attempting to solve an intelligence problem.

The first step defines the environment, based on the mission requirements provided by the intelligence customer. The analyst gathers all pertinent information about the mission location, and determines applicable boundaries, features, and characteristics. This first grouping of information provides the foundation layer(s) for the GEOINT product, including essential features that change rarely or slowly.


Geospatial preparation of the environment process flow and products.

The second step is to describe the environmentally related influence. In this step it is important to provide descriptive information about the location being analyzed, including all existing natural conditions such as cultural influences, infrastructure, and political environment. The analyst must also consider other factors that could potentially effect operations in the area, including weather; vegetation; roads; facilities; population; language; and social, ethnic, religious, and political factors. This information is then registered to the data layer(s) prepared in the first step.

The third step evaluates threats and hazards to the mission. This requires gathering available threat data from multiple intelligence disciplines related to the location, including details of the adversary, their forces, doctrine capabilities, and intent. Any information that provides background or insight on the threats for the location is closely investigated, registered, and added to the layers of information from the last two steps. In many cases the estimated geolocation of nonspatial data requires analysts to describe the uncertainty in spatial information and inferences; this requires collaboration with other entities within the intelligence community.

The last step of the GPE is to develop an analytic conclusion. After all the layers of information gathered have been integrated, registered, and considered, the analyst must make analytic judgments regarding operational considerations, such as physical terrain obstacles to vehicle movement, human terrain sensitivities to psychological operations (PSYOP) activities, line-of-sight restrictions to sensor placements, etc. The emphasis in this stage of analysis is on deriving proactive and predictive assessments; these assessments may include predicted effects of next courses of action, estimated impact statements, and assessments.

Most GEOINT products can be categorized as either standard or specialized; however, since GEOINT products are generally tailored to meet specific issues they cannot always be easily categorized.

Representative AGI Methods


Standard products include products such as maps and imagery and can be used as a stand-alone product or layered with additional data. Most standard products are derived from electrooptical or existing geospatial data but can be augmented with data derived from other sources. Standard products are generally 2D but 3D products are available; they make up the bulk of the GEOINT requirements and may include in-depth analysis, depending on the consumer’s requirements.

Specialized products take standard products to the next level by providing additional capability to tailor them for more specific situations. These products typically use data from a wider variety of geospatial sources and even data from other intelligence disciplines. Specialized products incorporate data from many more technically advanced sensors and typically incorporate more complex registration and exploitation techniques. One of the complex exploitation techniques commonly used in specialized products is Advanced Geospatial Intelligence (AGI), which includes all types of information technically derived from the processing, exploitation, and nonliteral analysis (to include integration or fusion) of spectral, spatial, temporal, phase history, and polarimetric data (Table 5.6). These types of data can be collected on stationary and moving targets by electrooptical, infrared, radar, and related sensors (both active and passive). AGI also includes both ancillary data needed for data processing or exploitation and signature information (to include development, validation, simulation, data archival, and dissemination).82



5.8   Summary

The fusion of image and spatial data is an important process that promises to achieve new levels of performance and integration with GISs for a wide variety of military, intelligence, and commercial application areas. By combining registered data from multiple sensors or views, and performing intelligent reasoning on the integrated data sets, fusion systems are beginning to significantly improve the performance of current-generation ATR, single-sensor imaging, and geospatial data systems.

There remain significant challenges to translate the state-of-the-art manual and limited semiautomated capabilities to large-scale production. A recent study by the National Research Council of image and spatial data fusion in GEOINT concluded, “Yet analysis methods have not evolved to integrate multiple sources of data rapidly to create actionable intelligence. Nor do today’s means of information dissemination, indexing, and preservation suit this new agenda or future needs.”87 Among the challenges for research identified in the study were developing ontologies for tagging GEOINT objects, image data fusion across space, time, spectrum and scale, and spatiotemporal database management systems to support fusion across all spatial sources. The report summarized that “Data acquired from multiple sensors carry varying granularity, geometric type, time stamps, and registered footprints. Data fusion rectifies coordinate positions to establish which features have not changed over time to focus on what has changed. The fusion, however, involves confronting several hard problems including spatial and temporal conflation, dealing with differential accuracy and resolutions, creating the ontologies and architectures necessary for interoperability, and managing uncertainty with metadata.”88




* Adapted from The Principles and Practice of Image and Spatial Data Fusion, in Proceedings of the 8th National Data Fusion Conference, Dallas, Texas, March 15–17, pp. 257–278, 1995.

* This mask is a derived layer produced by a spatial reasoning process in the scene generation stage to delimit the entire search region to only those allowable regions in which a target may reside.

