16 Random Set Theory for Multisource-Multitarget Information Fusion

This chapter deals with finite-set statistics (FISST),^{1, 2 and 3,80} which is also described in practitioner-level detail in the new textbook Statistical Multisource-Multitarget Information Fusion.⁴ FISST provides a unified, scientifically defensible, probabilistic foundation for the following aspects of multisource, multitarget, and multiplatform data fusion: (1) multisource integration (i.e., detection, identification, and tracking) based on Bayesian filtering and estimation;^{5, 6, 7, 8 and 9} (2) sensor management using control theory;^7,10 (3) performance evaluation using information theory;^{11, 12, 13 and 14} (4) expert systems theory (e.g., fuzzy logic, the Dempster–Shafer theory of evidence, and rule-based inference);^{15, 16 and 17} (5) distributed fusion;¹⁸ and (6) aspects of situation/threat assessment.¹⁹

The core of FISST is a multisource-multitarget differential and integral calculus based on the fact that so-called belief-mass functions are the multisensor-multitarget counterparts of probability-mass functions. One purpose of this calculus is to enable signal-processing practitioners to directly generalize conventional, engineering-friendly statistical reasoning to multisensor, multitarget, multievidence applications. Another purpose is to extend Bayesian methodologies so that they are capable of dealing with (1) imperfectly characterized data and sensor models and (2) true sensor and target models for multisource-multitarget problems. Therefore, FISST encompasses certain expert-system approaches that are often described as heuristic—fuzzy logic, the Dempster–Shafer theory of evidence, and rule-based inference—as special cases of a single Bayesian paradigm.

Following are some examples of applications where FISST techniques have been applied:

Multisource datalink fusion and target identification²⁰
Naval passive-acoustic antisubmarine fusion and target identification²¹
Air-to-ground target identification using synthetic aperture radar (SAR)²²
Scientific performance evaluation of multisource-multitarget data fusion algorithms^14,23,24
Unified detection and tracking using true multitarget likelihood functions and approximate nonlinear filters²⁵
Joint tracking, pose estimation, and target identification using high-range resolution radar (HRRR)²⁶

Owing to rapid progress in FISST-related fields since the first edition of this book in 2002, this chapter needed to be rewritten to a large extent.

The chapter is organized as follows. Section 16.1 introduces the basic practical issues underlying Bayes information fusion. Section 16.2 summarizes the basic statistical foundations of single-sensor, single-target tracking and identification. Section 16.3 shows how this familiar paradigm can be naturally extended to ambiguous nontraditional information, such as features, attributes, natural-language statements, and inference rules. The mathematical core of FISST—the multisource-multitarget integral and differential calculus—is summarized in Section 16.4. Section 16.5 introduces multisource-multitarget measurement models and their true likelihood functions. Section 16.6 describes multitarget motion models and their corresponding true multitarget Markov transition densities. The multisource-multitarget Bayes filter and various issues related to it (especially multitarget state estimation) are summarized in Section 16.7. Two new principles of approximate multitarget filtering techniques—the probability hypothesis density (PHD) filter and cardinalized probability hypothesis density (CPHD) filter—are described in Section 16.8. This section includes a brief survey of recent advances involving these filters. Conclusions are presented in Section 16.9.

16.1 Introduction

This section describes the problems that FISST is meant to address, and summarizes the FISST approach for addressing them. The section is organized as follows. Sections 16.1.1 and 16.1.2 describe some basic engineering issues in single-target and multitarget Bayesian inference. The FISST approach is summarized in Section 16.1.3, and Section 16.1.4 shows why this approach is necessary if the Bayesian Iceberg is to be confronted successfully.

16.1.1 Bayesian Iceberg: Models, Optimality, Computability

Recursive Bayesian nonlinear filtering and estimation have been the most accepted theoretical basis for developing algorithms that are optimal and practical. Our jumping-off point is Bayes’ rule:

\begin{matrix} f_{posterior} (x| z) = \frac{f (z| x) f_{prior} (x)}{f (z)} & (16.1) \end{matrix}

$\begin{matrix} f_{posterior} (x| z) = \frac{f (z| x) f_{prior} (x)}{f (z)} & (16.1) \end{matrix}$

where x denotes the unknown quantities of interest (e.g., position, velocity, and target class); f_prior(x), the prior distribution, encapsulates our previous knowledge of x; z represents a new measurement; f(z|x), the likelihood function, describes the generation of the measurements; f_posterior(x|z), the posterior distribution, encapsulates our current knowledge about x; and f(z) is the Bayes normalization factor.

If time has expired before collection of the new information, z, then f_prior(x) cannot be immediately used in Bayes’ rule. It must first be extrapolated to a new prior $f_{prior}^{+} (x)$ $f_{prior}^{+} (x)$ that accounts for the uncertainties caused by possible interim target motion. This extrapolation is accomplished using either the Markov time-prediction integral²⁷

\begin{matrix} f_{prior}^{+} (x| z) = \int f^{+} (x| y) \cdot f_{prior} (y) d y & (16.2) \end{matrix}

$\begin{matrix} f_{prior}^{+} (x| z) = \int f^{+} (x| y) \cdot f_{prior} (y) d y & (16.2) \end{matrix}$

or solution of the Fokker–Planck equation (FPE).²⁸ In Equation 16.2, f⁺(x|y′) is the Markov transition density. It describes the likelihood that the target will have state x if it previously had state y.

The density, f_posterior(x|z), contains all relevant information about the unknown state variables (i.e., position, velocity, and target identity) contained in x.

One could, in principle, plot graphs or contour maps of f_posterior(x|z) in real time. However, this presupposes the existence of a human operator trained to interpret them. Complete automation is necessary as most data fusion and tracking applications have fewer operators compared to data. To render the information in f_posterior(x|z) available for completely automated real-time applications, we need a Bayes-optimal state estimator that extracts an estimate of the actual target state from the posterior.

The most familiar Bayes-optimal state estimators are the maximum a posteriori (MAP) and expected a posteriori (EAP) estimators

\begin{matrix} \begin{matrix} {\hat{x}}^{MAP} = \underset{x}{arg max} f_{posterior} (x| z) & {\begin{matrix} \hat{\begin{matrix} x \end{matrix}} \end{matrix}}^{EAP} = \int x \end{matrix} \cdot f_{posterior} (x| z) d x & (\begin{matrix} 16.3 \end{matrix}) \end{matrix}

$\begin{matrix} \begin{matrix} {\hat{x}}^{MAP} = \underset{x}{arg max} f_{posterior} (x| z) & {\begin{matrix} \hat{\begin{matrix} x \end{matrix}} \end{matrix}}^{EAP} = \int x \end{matrix} \cdot f_{posterior} (x| z) d x & (\begin{matrix} 16.3 \end{matrix}) \end{matrix}$

The current popularity of the Bayesian approach is due, in large part, to the fact that it leads to provably optimal algorithms within a relatively simple conceptual framework. Since engineering mathematics is a tool and not an end in itself, this simplicity is a great strength—but also a great weakness. In and of itself, Bayes’ rule is nearly content-free—its proof requires barely a single line. Its power derives from the fact that it is merely the visible tip of a conceptual iceberg, the existence of which tends to be forgotten precisely because the rest of the iceberg is taken for granted. In particular, both the optimality and simplicity of the Bayesian framework can be taken for granted only within the confines of standard applications addressed by standard textbooks. When one ventures out of these confines one must exercise proper engineering prudence, which includes verifying that standard textbook assumptions still apply.

16.1.1.1 Bayesian Iceberg: Sensor Models

Bayes’ rule exploits, to the best possible advantage, the high-fidelity knowledge about the sensor contained in the likelihood function f(z|x). If f(z|x) is too imperfectly understood, then an algorithm will waste a certain amount N_sens of data trying (and perhaps failing) to overcome the mismatch between model and reality.

Many forms of data (e.g., generated by tracking radars) are so well characterized f(z|x) can be constructed with sufficient fidelity. Other kinds of data (e.g., SAR images or HRRR range profiles) are proving to be so difficult to simulate that there is no assurance that sufficiently high fidelity will ever be achieved, particularly in real-time operation. In such cases, a potential problem persists—namely, that our algorithm will be Bayes-optimal with respect to an imaginary sensor, unless we have the provably true likelihood f(z|x). That is, we are avoiding the real algorithmic issue: what to do when likelihoods cannot be sufficiently well characterized.

Finally, there are types of data—features extracted from signatures, English-language statements received over datalink, and rules drawn from knowledge bases—that are so ambiguous (i.e., poorly understood from a statistical point of view) that probabilistic approaches are not obviously applicable. This gap in Bayesian inference needs to be filled.

Even if f(z|x) can be determined with sufficient fidelity, multitarget problems present a new challenge. We will waste data—or worse—unless we find the corresponding provably true multitarget likelihood—the specific function f(z₁, …, z_m|x₁, …, x_n) that describes, with the same high fidelity as f(z|x), the likelihood that the sensor will collect observations z₁, …, z_m (m random) given the presence of targets with states x₁, …, x_n (n random). Bayes-optimality is meaningless unless we can construct the provably true f(z₁, …, z_m|x₁, …, x_n).

16.1.1.2 Bayesian Iceberg: Motion Models

Much of what has been said about likelihoods f(z|x) applies with equal force to Markov densities f⁺(x|y). The more accurately the f⁺(x|y) models target motion, the more effectively the Bayes’ rule will apply. Otherwise, a certain amount N_targ of data must be expended in overcoming poor motion-model selection.

Once again, however, what does one do in the multitarget situation if f⁺(x|y) is accurate? We must find the provably true multitarget Markov transition density—that is, the specific function f⁺(x₁, …, x_n|y₁, …, y_n′) that describes, with the same high fidelity as f⁺(x|y), how likely it is that a group of targets that previously were in states y₁, …, y_n′ (n′ random) will now be found in states x₁, …, x_n (n random). If we assume that n = n′ and that f⁺(x₁, …, x_n|y₁, …, y_n) = f⁺(x₁|y₁) … f⁺(x_n|y_n) then we are implicitly assuming that the number of targets is constant and target motions are uncorrelated. However, in real-world scenarios, targets can appear (e.g., multiple independently targetable reentry vehicles [MIRVs] and decoys emerging from a ballistic missile reentry vehicle) or disappear (e.g., aircraft that drop beneath radar coverage) in possibly correlated ways.

Consequently, multitarget filters that assume uncorrelated motion and constant target number may perform poorly against dynamic multitarget environments, and for the same reason, single-target trackers that assume straight-line motion may perform poorly against maneuvering targets. In either case, data are wasted in trying to overcome—successfully or otherwise—the effects of motion-model mismatch.

16.1.1.3 Bayesian Iceberg: State Estimation

Care must be taken when we face the problem of extracting an answer from the posterior distribution. If the state estimator has unrecognized inefficiencies, then a certain amount, N_est, of data will be unnecessarily wasted in trying to overcome them, though not necessarily with success. For example, the EAP estimator plays an important role in theory, but often produces erratic and inaccurate solutions when the posterior is multimodal.

In the multitarget case, the dangers of taking state estimation for granted become even grave. For example, we may fail to notice that the multitarget versions of the standard MAP and EAP estimators are not even defined, let alone provably optimal. The source of this failure is, moreover, not some abstruse point in theoretical statistics. Rather, it is due to a familiar part of everyday engineering practice: units of measurement.

16.1.1.4 Bayesian Iceberg: Formal Optimality

The failure of the standard Bayes-optimal state estimators in the multitarget case has consequences for optimality. A key point that is often overlooked is that many of the classical Bayesian optimality results depend on certain seemingly esoteric mathematical concerns. In perhaps 95% of all engineering applications—which is to say, standard applications covered by the standard textbooks—these concerns can be safely ignored. Danger awaits the unwary in the other 5%; the case with multitarget applications is similar. Because the standard Bayes-optimal state estimators fail in the multitarget case, we must construct new multitarget state estimators and prove that they are well behaved.

16.1.1.5 Bayesian Iceberg: Computability

If the simplicity of Equations 16.1, 16.2 and 16.3 cannot be taken for granted in so far as modeling and optimality are concerned, then such is even more the case when it comes to computational tractability.²⁹ The prediction integral and Bayes normalization constant must be computed using numerical integration, and—since an infinite number of parameters are required to characterize f_posterior(x|y) in general—approximation is unavoidable. Naive approximations create the same difficulties as model-mismatch problems. An algorithm must waste a certain amount N_appx of data overcoming—or failing to overcome—accumulation of approximation error, numerical instability, etc.

16.1.1.6 Bayesian Iceberg: Robustness

The engineering issues addressed in the previous sections (measurement-model mismatch, motion-model mismatch, inaccurate or slowly convergent state estimators, accumulation of approximation error, and numerical instability) can be collectively described as problems of robustness—that is, the brittleness that Bayesian approaches can exhibit when reality steeply deviates from assumption. One might be tempted to argue that, in practical application, these difficulties can be overcome by simple brute force—that is, by assuming that the data rate is high enough to permit a large number of computational cycles per unit time. In this case—or so the argument goes—the algorithm will override its internal inefficiencies because the total amount, N_data, of data that is collected is much larger than the amount, N_ineffic∆ = N_sens + N_targ + N_est + N_apps, of data required to overcome those inefficiencies.

If this were the case, there would be few tracking and target identification problems left to solve. Current problems are challenging either because data rates are not sufficiently high or brute force computation cannot be accomplished in real time.

16.1.2 Why Multisource, Multitarget, Multievidence Problems Are Tricky

One needs systematic and completely probabilistic methodologies for the following:

Modeling uncertainty in poorly characterized likelihoods
Modeling ambiguous nontraditional data, and the likelihood functions for such data
Constructing multisource likelihoods for nontraditional data
Fusing efficiently data from all sources (nontraditional or otherwise)
Constructing provably true multisource-multitarget likelihood functions from underlying models of the sensors
Constructing provably true multitarget Markov densities from underlying multitarget motion models, which account for correlated motion and changes in target number
Constructing provably optimal multitarget state estimators that simultaneously determine target number, target kinematics, and target identity without resorting to operator intervention or optimal report-to-track association
Developing principled approximate multitarget filters that preserve as much application realism as possible

16.1.3 Finite-Set Statistics

One of the major goals of FISST is to address the Bayesian Iceberg issues described in Sections 16.1.1 and 16.1.2. FISST deals with imperfectly characterized data and measurement models by extending Bayesian approaches in such a way that they are robust with respect to these ambiguities. This robust-Bayes methodology is described in more detail in Section 16.3. FISST deals with the difficulties associated with multisource and multitarget problems by directly extending engineering-friendly, single-sensor, single-target statistical calculus to the multisensor-multitarget realm. This optimal-Bayes methodology is described in Sections 16.4, 16.5, 16.6 and 16.7. Finally, FISST provides mathematical tools that may help address the formidable computational difficulties associated with multisource-multitarget filtering (whether optimal or robust). Some of these ideas are discussed in Section 16.8.

The basic approach is as follows. A suite of known sensors transmits, to a central data fusion site, the observations they collect regarding targets whose number, positions, velocities, identities, threat states, etc., are unknown. Then it is necessary to carry out the following:

Reconceptualize mathematically the sensor suite as a single sensor (a meta-sensor)
Reconceptualize mathematically the target set as a single target (a meta-target) with multitarget state X = {x₁, …, x_n} (a meta-state)
Reconceptualize the current set Z = {z₁, …, z_m} of observations, collected by the sensor suite at approximately the same time, as a single measurement (a meta-measurement) of the meta-target observed by the meta-sensor
Represent statistically ill-characterized (ambiguous) data as random closed subsets Θ of (multisource) observation space
Model multitarget multisensor data using a multisensor-multitarget measurement model—a random finite set ∑ = T(X) ∪ C(X), just as single-sensor, single-target data can be modeled using a measurement model Z = h(x, W)
Model the motion of multitarget systems using a multitarget motion model—a randomly varying finite set Ξ_k+1 = T_k(X) ∪ B_k(X), just as single-target motion can be modeled using a motion model X_k+1 = Φ_k(x, V_k).

Given this, we can mathematically reformulate multisensor, multitarget estimation problems as single-sensor, single-target problems. The basis of this reformulation is the concept of belief-mass. Belief-mass functions are nonadditive generalizations of probability-mass functions. (However, they are not heuristic: they are equivalent to probability-mass functions on certain abstract topological spaces.¹) That is

Just as likelihood functions are used to describe the generation of conventional data, z, use generalized likelihood functions to describe the generation of nontraditional data.
Just as the probability-mass function p(S|x) = Pr(Z ∈ S|x) of a single-sensor, single-target measurement model is used to describe the statistics of ordinary data, use the belief-mass function β(S|X) = Pr(∑ ⊆ S|X) of a multisource-multitarget measurement model ∑ to describe the statistics of multisource-multitarget data.
Just as the probability-mass function p_k+1|k(S|y) = Pr(X ∈ S|y) of a single-target motion model is used to describe the statistics of single-target motion, use the belief-mass function β_k+1|k(S|Y) = Pr(Ξ_k+1 ⊆ S|Y) of a multitarget motion model Ξ_k+1 to describe multitarget motion.

The FISST multisensor-multitarget differential and integral calculus is the one that transforms these mathematical abstractions into a form that can be used in practice:

Just as the likelihood function, f(z|x), can be derived from p(S|x) via differentiation, so the true multitarget likelihood function f(Z|X) can be derived from β(S|X) using a generalized differentiation operator called the set derivative.
Just as the Markov transition density f_k+1|k(x|y) can be derived from p_k+1|k(S|y) via differentiation, so the true multitarget Markov transition density can be derived from β_k+1|k(S|Y) via set differentiation.
Just as f(z|x) and p(S|x) are related by p(S|x) = ∫_Sf(z|x)dx, so f(Z|X) and β(S|X) are related by β(S|X) = ∫_Sf(Z|X)δZ where the integral is now a multisource-multitarget set integral.

Accordingly, let Z^(k): Z₁, …, Z_k be a time sequence of multisource-multitarget observation sets. The multitarget Bayes filter allows multitarget posterior distributions f_k|k(X|Z^(k)) to be created from the multisource-multitarget likelihood:

f_k|k(∅|Z^(k)) = posterior likelihood that no targets are present

f_k|k({x₁, …, x_n}|Z^(k)) = posterior likelihood that n targets with states x₁, …, x_n are present

From these distributions, simultaneous, provably optimal estimates of target number, kinematics, and identity can be computed without resorting to the optimal report-to-track assignment, characteristic of multihypothesis approaches. Finally, these fundamentals enable both optimal-Bayes and robust-Bayes multisensor-multitarget data fusion, detection, tracking, and identification.

TABLE 16.1
Parallels between Single-Target and Multitarget Statistics

Images

Table 16.1 summarizes the direct mathematical parallels between the single-sensor, single-target statistics and the multisensor-multitarget statistics. This parallelism is so close that general statistical methodologies can, with a bit of prudence, be directly translated from the single-sensor, single-target to the multisensor-multitarget case. That is, the table can be thought of as a dictionary that establishes a direct correspondence between the words and grammar in the random vector language and cognate words and grammar of the random set language. Consequently, any sentence (any concept or algorithm) phrased in the random vector language can, in principle, be directly translated into a corresponding sentence (corresponding concept or algorithm) in the random set language. The correspondence between dictionaries is, of course, not precisely one-to-one. For example, vectors can be added and subtracted, whereas finite sets cannot. Nevertheless, the parallelism is complete enough that, provided some care is taken, 100 years of accumulated knowledge about single-sensor, single-target statistics can be directly brought to bear on multisensor-multitarget problems. This process can be regarded as a general methodology for attacking multisource-multitarget data fusion problems.

16.1.4 Why Random Sets?

Random set theory was systematically formulated by Matheron in the mid-1970s.³⁰ Its centrality as a unifying foundation for expert systems theory and ill-characterized evidence has become increasingly apparent since the mid-1980s.^1,31,81,82 Its centrality as a unifying foundation for data fusion applications has been promoted since the late-1970s by I.R. Goodman. The basic relationships between random set theory and the Dempster–Shafer theory of evidence were established by Shafer and Logan,³² Nguyen,³³ and Hestir et al.³⁴ The basic relationships between random set theory and fuzzy logic can be attributed to Goodman,³⁵ Orlov,³⁶ and Hohle.³⁷ Mahler developed relationships between random set theory and rule-based evidence.^38,39

FISST, whose fundamental ideas were codified in 1993 and 1994, builds upon this existing body of research by showing that random set theory provides a unified framework for expert systems theory and multisensor-multitarget fusion detection, tracking, identification, sensor management, and performance estimation. FISST is unique in that it provides, under a single probabilistic paradigm, a unified and relatively simple and familiar statistical calculus for addressing all of the Bayesian Iceberg problems described earlier.

16.2 Review of Bayes Filtering and Estimation

This section summarizes those aspects of conventional statistics that are most pertinent to tracking, target identification, and information fusion. The foundation of applied tracking and identification—the recursive Bayesian nonlinear filtering equations—are described in Section 16.2.1. The procedure for constructing provably true sensor likelihood functions from sensor models, and provably true Markov transition densities from target motion models, is described in Sections 16.2.2 and 16.2.3, respectively. Bayes-optimal state estimation is reviewed in Section 16.2.4. Data vectors have the form y = (y₁, …, y_n, w₁, …, w_n), where y₁, …, y_n are continuous variables and w₁, …, w_n are discrete variables.¹ Integrals of functions of such variables involve both summations and continuous integrals.

16.2.1 Bayes Recursive Filtering

Most signal-processing practitioners are familiar with the Kalman filtering equations. Less well known is the fact that the Kalman filter is a special case of the Bayesian discrete-time recursive nonlinear filter.^27,28,40 This more general filter is nothing more than Equations 16.1, 16.2 and 16.3 applied recursively:

\begin{matrix} f_{k + 1 | k} (x| Z^{k}) = \int f_{k + 1 | k} (x| y) \cdot f_{k | k} (y) d y & (16.4) \end{matrix}

$\begin{matrix} f_{k + 1 | k} (x| Z^{k}) = \int f_{k + 1 | k} (x| y) \cdot f_{k | k} (y) d y & (16.4) \end{matrix}$

\begin{matrix} f_{k + 1 | k + 1} (x| Z^{k + 1}) = \frac{f_{k + 1} (z_{k + 1} | x) \cdot f_{k + 1 | k} (x| Z^{k})}{f_{k + 1} (z_{k + 1} | Z^{k})} & (16.5) \end{matrix}

$\begin{matrix} f_{k + 1 | k + 1} (x| Z^{k + 1}) = \frac{f_{k + 1} (z_{k + 1} | x) \cdot f_{k + 1 | k} (x| Z^{k})}{f_{k + 1} (z_{k + 1} | Z^{k})} & (16.5) \end{matrix}$

\begin{matrix} \begin{matrix} {\hat{\begin{matrix} x \end{matrix}}}_{k | k}^{MAP} = \underset{x}{arg max} f_{k | k} (x| Z^{k}) & {\hat{\begin{matrix} x \end{matrix}}}_{k | k}^{EAP} = \int x \cdot f_{k | k} (x| Z^{k}) d x \end{matrix} & (16.6) \end{matrix}

$\begin{matrix} \begin{matrix} {\hat{\begin{matrix} x \end{matrix}}}_{k | k}^{MAP} = \underset{x}{arg max} f_{k | k} (x| Z^{k}) & {\hat{\begin{matrix} x \end{matrix}}}_{k | k}^{EAP} = \int x \cdot f_{k | k} (x| Z^{k}) d x \end{matrix} & (16.6) \end{matrix}$

where

\begin{matrix} f_{k + 1} (z| Z^{k}) = \int f_{k + 1 | k} (z| x) \cdot f_{k + 1 | k} (x| Z^{k}) d x & (16.7) \end{matrix}

$\begin{matrix} f_{k + 1} (z| Z^{k}) = \int f_{k + 1 | k} (z| x) \cdot f_{k + 1 | k} (x| Z^{k}) d x & (16.7) \end{matrix}$

The practical success of Equations 16.4 and 16.5 relies on the ability to construct effectively the likelihood function, f_k+1(z|x), and the Markov transition density, f_k+1|k(x|y). Although likelihood functions sometimes are constructed via direct statistical analysis of data, more typically they are constructed from sensor measurement models. Markov densities typically are constructed from target motion models. In either case, differential and integral calculus must be used, as shown in the following two sections.

16.2.2 Constructing Likelihood Functions from Sensor Models

Suppose that a target with randomly varying state X is interrogated by a sensor that generates observations of the form Z = h(x) + W (where W is a zero-mean random noise vector with density f_w(w)) but does not generate missed detections or false alarms. The statistical behavior of Z at time-step k is characterized by its likelihood function, f_k+1(z|x), which describes the likelihood that the sensor will collect measurement z given that the target has state x. How is this likelihood function computed?

Begin with the probability-mass function of the sensor model: p_k+1(S|x) = Pr(Z ∈ S|x). This is the total probability that the random observation Z will be found in the given region S if the target has state x. The total probability-mass, p(S|x), in region S is the sum of all of the likelihoods in that region:

p_k+1(S|x) = ∫ f_k+1(y|x)dy

So, if E_z is some very small region surrounding the point z with (hyper) volume V = λ(E_z), then

p_k+1(E_z|x) = ∫_{E_z} f_k+1(y|x)dy ≅ f_k+1(z|x) · λ(E_z)

(For example, E_z = B_ε,z could be a hyperball of radius ε centered at z.) In this case,

\frac{p_{k + 1} (E_{z} | x)}{λ (E_{z})} ≅ f_{k + 1} (z| x)

$\frac{p_{k + 1} (E_{z} | x)}{λ (E_{z})} ≅ f_{k + 1} (z| x)$

where the smaller the value of λ(E_z), the more accurate the approximation. Stated otherwise, the likelihood function, f_k+1(z|x), can be constructed from the probability measure, p_k+1(S|x), via the limiting process

\begin{matrix} f_{k + 1} (z|x) = \frac{δ p_{k + 1}}{δ_{z}} = \lim_{ɛ \to 0} \frac{p_{k + 1} (E_{z} | x)}{λ (E_{z})} & (16.8) \end{matrix}

$\begin{matrix} f_{k + 1} (z|x) = \frac{δ p_{k + 1}}{δ_{z}} = \lim_{ɛ \to 0} \frac{p_{k + 1} (E_{z} | x)}{λ (E_{z})} & (16.8) \end{matrix}$

The resulting equations

\begin{matrix} \begin{matrix} p_{k + 1} (S | x) = \int_{S} \frac{{δ p}_{k + 1}}{δ z} d z & \frac{δ}{δ z} \int_{S} f_{k + 1} \end{matrix} (y| x) d y = f_{k + 1} (z| x) & (16.9) \end{matrix}

$\begin{matrix} \begin{matrix} p_{k + 1} (S | x) = \int_{S} \frac{{δ p}_{k + 1}}{δ z} d z & \frac{δ}{δ z} \int_{S} f_{k + 1} \end{matrix} (y| x) d y = f_{k + 1} (z| x) & (16.9) \end{matrix}$

are the relationships that show that f_k+1(z|x) is the provably true likelihood function—that is, the density function that faithfully describes the measurement model Z = h_k+1(x, W_k+1). For this particular problem, the true likelihood is, therefore

f_{k + 1} (z | x) = \lim_{ɛ \to 0} \frac{Pr (Z \in B_{ɛ, z} | x)}{λ (B_{ɛ, z})} = \lim_{ɛ \to 0} \frac{p_{W_{k + 1}} (B_{ɛ, z - h_{k + 1} (x)})}{λ (B_{ɛ, z - h_{k + 1} (x)})} = f_{W_{k + 1}} (z - h_{k + 1} (x))

$f_{k + 1} (z | x) = \lim_{ɛ \to 0} \frac{Pr (Z \in B_{ɛ, z} | x)}{λ (B_{ɛ, z})} = \lim_{ɛ \to 0} \frac{p_{W_{k + 1}} (B_{ɛ, z - h_{k + 1} (x)})}{λ (B_{ɛ, z - h_{k + 1} (x)})} = f_{W_{k + 1}} (z - h_{k + 1} (x))$

16.2.3 Constructing Markov Densities from Motion Models

Suppose that, between the k^th and (k + 1)^st measurement collection times, the motion of the target is best modeled by an equation of the form X_k+1 = Φ_k(x) + V_k, where V_k is a zero-mean random vector with density fv_k(v). That is, if the target had state x at time-step k, then it will have state Φ_k(x) at time-step k + 1—except possible error in this belief is accounted for by appending the random variation V_k. How would f_k+1|k(y|x) be constructed?

This situation parallels that of Section 16.2.2. The probability-mass function p_k+1|k(S|x) = Pr(X_k+1 ∈ S|x) is the total probability that the target will be found in region S at time-step k + 1, given that it had state x at time-step k. So,

f_{k + 1 | k} (y | x) = \lim_{ɛ \to 0} \frac{p_{k + 1 | k} (B_{ɛ, y} | x)}{λ (B_{ɛ, y})} = f_{v_{k}} (y - Φ_{k} (x))

$f_{k + 1 | k} (y | x) = \lim_{ɛ \to 0} \frac{p_{k + 1 | k} (B_{ɛ, y} | x)}{λ (B_{ɛ, y})} = f_{v_{k}} (y - Φ_{k} (x))$

is the true Markov density associated with the motion model X_k+1 = Φ_k(X_k) + V_k. More generally, the equations

\begin{matrix} p_{k + 1 | k} (S | x) = \int_{S} \frac{{δ p}_{k + 1 | k}}{δ y} d y & \frac{δ}{δ y} \int_{S} \end{matrix} f_{k + 1 | k} (w| x) d w = f_{k + 1 | k} (y| x)

$\begin{matrix} p_{k + 1 | k} (S | x) = \int_{S} \frac{{δ p}_{k + 1 | k}}{δ y} d y & \frac{δ}{δ y} \int_{S} \end{matrix} f_{k + 1 | k} (w| x) d w = f_{k + 1 | k} (y| x)$

are the relationships showing that f_k+1|k(y|x) is the provably true Markov density—that is, the density function that faithfully describes the motion model X_k+1 = Φ_k(x) + V_k.

16.2.4 Optimal State Estimators

An estimator of the state x is any family $\hat{x} (z_{1}, ..., z_{m})$ $\hat{x} (z_{1}, ..., z_{m})$ of state-valued functions of the (static) measurements z₁, …, z_m. Good state estimators $\hat{x}$ $\hat{x}$ should be Bayes-optimal in the sense that, in comparison to all other possible estimators, they minimize the Bayes risk

R_{C} (\hat{x}, m) = \int C (x, \hat{x} (z_{1}, ..., z_{m})) \cdot f (z_{1}, ..., z_{m} | x) \cdot f (x) d x d z_{1} \cdot \cdot \cdot d z_{m}

$R_{C} (\hat{x}, m) = \int C (x, \hat{x} (z_{1}, ..., z_{m})) \cdot f (z_{1}, ..., z_{m} | x) \cdot f (x) d x d z_{1} \cdot \cdot \cdot d z_{m}$

for some specified cost (i.e., objective) function C(x, y) defined by states x, y.⁴¹ Second, they should be statistically consistent in the sense that $\hat{x} (z_{1}, ..., z_{m})$ $\hat{x} (z_{1}, ..., z_{m})$ converges to the actual target state as m → ∞. Other properties (e.g., asymptomatically unbiased, rapidly convergent, stably convergent, etc.) are desirable as well. The most common good Bayes state estimators are the MAP and EAP estimators described earlier.

16.3 Extension to Nontraditional Data

One of the most challenging aspects of information fusion has been the highly disparate and ambiguous forms that information can have. Many kinds of data, such as that supplied by tracking radars, can be described in statistical form. However, statistically uncharacterizable real-world variations make the modeling of other kinds of data, such as SAR images, highly problematic.

It has been even more ambiguous how still other forms of data—natural-language statements, features extracted from signatures, rules drawn from knowledge bases—might be mathematically modeled and processed. Numerous expert-system approaches have been proposed to address such problems. But their burgeoning number and variety have led to much confusion and controversy.

This section addresses the question of how to extend the Bayes filter to situations in which likelihood functions and data are imperfectly understood.^4,42 It is in this context that we introduce the two types of data and the four types of measurements that will concern us:

Two types of data: state-estimates versus measurements
Four types of measurements: unambiguously generated unambiguous (UGU), unambiguously generated ambiguous (UGA), ambiguously generated ambiguous (AGA), and ambiguously generated unambiguous (AGU)

The four types of measurements are defined as follows:

UGU measurements are conventional measurements, as described in Section 16.2.
AGU measurements include conventional measurements, such as SAR or HRRR, whose likelihood functions are ambiguously defined because of statistically uncharacterizable real-world variations.
UGA measurements resemble conventional measurements in that their relationship to target state is precisely known, but differ in that there is ambiguity regarding what is actually being observed. Examples include attributes or features extracted by humans or digital signal processors from signatures, natural-language statements, rules, etc.
AGA measurements are the same as UGA measurements, except that not only the measurements themselves but also their relationship to target state is ambiguous. Examples include attributes, features, natural-language statements, and rules, for which sensor models must be constructed from human-mediated expert knowledge.

The basic approach we shall follow is outlined in the following:

Represent statistically ill-characterized (ambiguous) UGA, AGA, and AGU data as random closed subsets Θ of (multisource) observation space.
Thus, in general, multisensor-multitarget observations will be randomly varying finite sets of the form Z = {z₁, …, z_m, Θ₁, …, Θ_m′}, where z₁, …, z_m are conventional data and Θ₁, …, Θ_m′ are ambiguous data.
Just as the probability-mass function p(S|x) = Pr(Z ∈ S|x) is used to describe the generation of conventional data z, use generalized likelihood functions f(Θ|x) to describe the generation of ambiguous data.
Construct single-target posteriors f_k|k(x|Z^k) conditioned on all data, whether ambiguous or otherwise.
Proceed essentially as described in Section 16.2.

Section 16.3.1 discusses the issue of modeling data in general: traditional measurements, ambiguous measurements, or state-estimates. Section 16.3.2 introduces the concept of a generalized measurement, and their representation using random sets. Section 16.3.3 introduces the basic random set uncertainty models: imprecise, fuzzy/vague, Dempster–Shafer, and contingent (rules). The next three sections discuss models and generalized likelihood functions for the three types of generalized measurements: UGA (Section 16.3.4), AGA (Section 16.3.5), and AGU (Section 16.3.6). Section 16.3.7 addresses the problem of ambiguous (specifically, Dempster–Shafer) state-estimates. Finally, the extension of the recursive Bayesian nonlinear filtering equations to nontraditional data is discussed in Section 16.3.8.

16.3.1 General Data Modeling

The modeling of observations as vectors z in some Euclidean space is so ubiquitous that it is commonplace to think of z as data in itself. However, this is not actually the case: z is actually a mathematical abstraction that serves as a representation of some real-world entity called a datum.

The following are examples of actual data that occur in the real world: a voltage, a radio frequency (RF) intensity-signature, an RF I&Q (in-phase and quadrature) signature, a feature extracted from a signature by a digital signal processor, an attribute extracted from an image by a human operator, a natural-language statement supplied by a human observer, a rule drawn from a rule-base, and so on.

All of these measurement types are mathematically meaningless—which is to say, we cannot do anything algorithmic with them—unless we first construct mathematical abstractions that model them. Thus, voltages are commonly modeled as real numbers. Intensity signatures are modeled as real-valued functions or, when discretized into bins, as vectors. I&Q signatures are modeled as complex-valued functions or as complex vectors. Features are commonly modeled using integers, real numbers, feature-vectors, etc.

For these kinds of data, relatively little ambiguity adheres to the representation of a given datum by its associated model. The only uncertainty in such data is that associated with the randomness of the generation of measurements by targets. This uncertainty is modeled by the likelihood function f_k+1(z|x). Thus, it is conventional to think of the z in f_k+1(z|x) as a datum and of f_k+1(z|x) as the full encapsulation of its uncertainty model.

This reasoning will not suffice for data types such as rules, natural-language statements, human-extracted attributes, or more generally, any information involving some kind of human mediation. In reality, z is a model z_D of some real-world datum D, and the likelihood actually has the form f_k+1(D|x) = f_k+1(z_D|x).

Consider, for example, a natural-language report supplied by a human observer:

D = The target is near sector five.

Besides randomness, two more kinds of ambiguity impede the modeling of this datum. The first kind is due to ignorance. The observer will make random errors because of factors such as fatigue, excessively high data rates, deficiencies in training, and deficiencies in ability. In principle, one could conduct a statistical analysis of the observer to determine a likelihood function that models his/her data generation process. In practice, such an analysis is rarely feasible. This fact introduces a nonstatistical component to the uncertainty associated with D, and we must find some way to model that uncertainty to mitigate its contaminating effects.

The second kind of uncertainty is caused by the ambiguities associated with constructing an actionable mathematical model of D. How do we model fuzzy and context-dependent concepts such as near, for example? Thus, a complete data model for D must have the form f_k+1(Θ_D|x) where Θ_D is a mathematical model of both D and the uncertainties associated with constructing Θ_D; and where f_k+1(Θ_D|x) is a mathematical model of both the process by which Θ_D is generated given x, and the uncertainties associated with constructing f_k+1(Θ_D|x).

To summarize, comprehensive data modeling requires a unified, systematic, and theoretically defensible procedure for accomplishing the following four steps:

Creation of the mathematized abstractions that represent individual physical-world observations, including
Some approach for modeling any ambiguities that may be inherent in this act of abstraction.
Creation of the random variable that, by selecting among the possible mathematized abstractions, models data generation, including
Some approach for modeling any ambiguities caused by gaps in our understanding of how data generation occurs.

In conventional applications, steps 1, 2, and 4 are usually taken for granted, so that only the third step remains and ends up being described as the complete data model. If we are to process general and not just conventional information sources, however, we must address the other three steps.

16.3.2 Generalized Measurements

The FISST approach to data that is difficult to statistically characterize is based on the key notion that ambiguous data can be probabilistically represented as random closed subsets of (multisource) measurement space.¹

Suppose that a data-collection source observes a scene. It does not attempt to arrive at an a posteriori determination about the meaning (i.e., the state) of what has been observed. Rather, it attempts only to construct an interpretation of, or opinion about, what it has or has not observed. Any uncertainties due to ignorance are therefore associated with the data-collection process alone and not with the target state.

The simplest instance of nonstatistical ambiguity is an imprecise measurement. That is, the data-collection source cannot determine the value of a measurement, z, precisely but, rather, only to within containment in some measurement-set S. Thus, S is the actual measurement. If one randomizes S, including randomization of position, size, etc., one gets a random imprecise measurement.

Another kind of deterministic measurement is said to be fuzzy or vague. Because any single constraint, S, could be erroneous, the data-collection source specifies a nested sequence S₁ ⊆ … ⊆ S_e of alternative constraints, with the constraint S_i assigned a belief s_i ≥ 0, which is the correct one, with s₁ + … + s_e = 1. If S_e is the entire measurement space then the data-collection source is stipulating that there is some possibility that it may know nothing whatsoever about the value of z. The nested constraint S₁ ⊆ … ⊆ S_e, taken together with its associated weights, is the actual measurement. It can be represented as a random subset Θ of measurement space by defining Pr(Θ = S_i) = s_i.

If one randomizes all parameters of a vague measurement (centroid, size, shape, and number of its nested component subsets), one gets a random vague measurement. A (deterministic) vague measurement is just an instantiation of a random vague measurement.

Uncertainty, in the Dempster–Shafer sense, generalizes vagueness in that the component hypotheses no longer need to be nested. In this case Θ is discrete but otherwise arbitrary: Pr(Θ = S_i) = s_i. By randomizing all parameters, one gets a random uncertain measurement.

All such measurement types have one thing in common: they can be represented mathematically by a single probabilistic concept: a random subset of measurement space. Thus expressed with the greatest mathematical generality, a generalized measurement can be represented as an arbitrary random closed subset Θ of measurement space.

(Caution: The random closed subset Θ is a model of a single observation collected by a single source. This subset should not be confused with a multisensor, multitarget observation-set, ∑, whose instantiations ∑ = Z are finite sets of the form Z = {z₁, …, z_m, Θ₁, …, Θ_m′} where z₁, …, z_m are individual conventional observations and Θ₁, …, Θ_m′ are random set models of individual ambiguous observations.)

16.3.3 Random Set Uncertainty Models

Recognizing that random sets provide a common probabilistic foundation for various kinds of statistically ill-characterized data is not enough. We must also be able to construct practical random set representations of such data. This section shows how three kinds of ambiguous data can be modeled using random sets: vague, uncertain, and contingent.

16.3.3.1 Vague Measurements: Fuzzy Logic

A fuzzy measurement, g, is a fuzzy membership function on measurement space. That is, it is a function that assigns a number g(z) between zero and one to each measurement z in the measurement space. Fuzzy measurements g(z) and g′(z) can be fused using Zadeh’s conjunction defined by (g ∧ g′)(z) = min{g(z), g′(z)}.

The random subset ∑_A(g), called the synchronous random set representation of the fuzzy subset g, is defined by

\begin{matrix} \sum_{A} (g) = {z| A \leq g (z)} & (16.10) \end{matrix}

$\begin{matrix} \sum_{A} (g) = {z| A \leq g (z)} & (16.10) \end{matrix}$

where A is a uniformly distributed random number on [0, 1].

16.3.3.2 Uncertain Measurements: Dempster–Shafer Evidence

A Dempster–Shafer measurement, B, is a Dempster–Shafer body of evidence on measurement space. That is, it consists of nonempty focal subsets B: B₁, …, B_b of measurement space and nonnegative weights b₁, …, b_b that sum to 1. Dempster–Shafer measurements can be fused using Dempster’s combination “*”.

Define the random subset Θ of U by Pr(Θ = B_i) = b_i for i = 1, …, b. Then Θ is the random set representation of B.^1,33,34,43

The Dempster–Shafer measurements can be generalized to fuzzy Dempster–Shafer measurements. A fuzzy Dempster–Shafer measurement is one in which the focal subsets B_i are fuzzy membership functions on measurement space.⁴⁴

Fuzzy Dempster–Shafer measurements can also be represented in random set form, though this representation is too complicated to discuss in the present context.

16.3.3.3 Contingent Measurements: Rules

Knowledge-based rules have the form X ⇒ S = if X then S where S, X are subsets of measurement space. The rule X ⇒ S is said to have been fired if its antecedent X is observed, and thus we infer that the consequent S—and thus also X ∩ S—are also true.

There is at least one way to represent knowledge-based rules in random set form.^38,39 Specifically, let Φ be a uniformly distributed random subset of U—that is, one whose probability distribution is Pr(Φ = S) = 2^–|U| for all S ⊆ U. A random set representation ∑_A(X ⇒ S) of the rule X ⇒ S is

\begin{matrix} \sum_{Φ} (X \Rightarrow S) = (S \cap X) \cup (X^{c} \cap Φ) & (16.11) \end{matrix}

$\begin{matrix} \sum_{Φ} (X \Rightarrow S) = (S \cap X) \cup (X^{c} \cap Φ) & (16.11) \end{matrix}$

Similar random set representations can be devised for fuzzy rules, composite fuzzy rules, and second-order rules.

16.3.4 Unambiguously Generated Ambiguous Measurements

The simplest kind of generalized measurement is the UGA measurement, which is characterized by the following two points:

Modeling the measurement itself involves ambiguity; but
The relationship between measurements and target states can be described by a precise state-to-measurement transform model of the familiar form z = h(x).

Typical applications that require UGA measurement modeling are those that involve human-mediated feature extraction, but in which the features are distinctly associated with target types.

Consider, for example, a feature extracted by a human operator from a SAR image: the number n of hubs/tires. In this case, it is known a priori that a target of type ν will have a given number n = h(ν) of hubs (if a treaded vehicle) or n = h(ν) of tires (if otherwise). Suppose that the possible values of n are n = 1, …, 8. The generalized observation Θ is a random subset of the measurement space {1, …, 8}.

16.3.4.1 Generalized Likelihood Functions for UGA Measurements

As described in Section 16.2.2, the statistical measurement model typically employed in target detection and tracking has the form Z = h(x) + W. Likewise, the corresponding likelihood function has the form f_k+1(z|x) = f_W(z − h(x)). That is, it is the probability (density) that h(x) = z − W or, expressed otherwise, that h(x) ∈ Θ_z where Θ_z is the random singleton subset defined by Θ_z = {z − W}.

It can be shown that, for an arbitrary UGA measurement represented as a random subset, Θ, of measurement space, the corresponding measurement model is h(x) ∈ Θ_z and the corresponding likelihood function is

\begin{matrix} f_{k + 1} (Θ | x) = Pr (h (x) \in Θ) & (16.12) \end{matrix}

$\begin{matrix} f_{k + 1} (Θ | x) = Pr (h (x) \in Θ) & (16.12) \end{matrix}$

This actually defines a generalized likelihood function in the sense that, in general, it integrates to infinity rather than to unity:

∫ f_k+1(z|x)dz = ∞

Joint generalized likelihood functions can be defined in the obvious manner. Let Θ₁, …, Θ_m be generalized measurements. Then their joint likelihood is

f_{k + 1} (Θ_{1}, ..., Θ_{m} | x) = Pr (h (x) \in Θ_{1}, ..., h (x) \in Θ_{m}) = Pr (h (x) \in Θ_{1} \cap ... \cap Θ_{m})

$f_{k + 1} (Θ_{1}, ..., Θ_{m} | x) = Pr (h (x) \in Θ_{1}, ..., h (x) \in Θ_{m}) = Pr (h (x) \in Θ_{1} \cap ... \cap Θ_{m})$

For example, consider the hub/tire feature. Then the generalized measurement model h(ν) ∈ Θ indicates the degree of matching between the data model, Θ, with the known feature, h(ν), associated with a target of type ν. The generalized likelihood

f_{k + 1} (Θ | v) = Pr (h (v) \in Θ)

$f_{k + 1} (Θ | v) = Pr (h (v) \in Θ)$

is a measure of the degree of this matching.

Because of Equation 16.12, specific formulas can be derived for different kinds of UGA measurements. First, suppose that Θ = ∑_A(g), where g(z) is a fuzzy measurement. That is, it is a fuzzy membership function on measurement space. Then it can be shown that the corresponding likelihood function is

\begin{matrix} f_{k + 1} (g | x) = Pr (h (x) \in \sum_{A} (g)) = g (h (x)) & (16.13) \end{matrix}

$\begin{matrix} f_{k + 1} (g | x) = Pr (h (x) \in \sum_{A} (g)) = g (h (x)) & (16.13) \end{matrix}$

Next, suppose that B is a Dempster–Shafer measurement. That is, it is a list B₁, …, B_m of subsets of measurement space with corresponding weights b₁, …, b_m. If Θ_B is the random set representation of B then it can be shown that the corresponding generalized likelihood function is

\begin{matrix} f_{k + 1} (B | x) = Pr (h (x) \in Θ_{B}) = \sum_{j = 1}^{m} b_{j} \cdot 1_{B_{i}} (h (x)) & (16.14) \end{matrix}

$\begin{matrix} f_{k + 1} (B | x) = Pr (h (x) \in Θ_{B}) = \sum_{j = 1}^{m} b_{j} \cdot 1_{B_{i}} (h (x)) & (16.14) \end{matrix}$

Finally, suppose that Θ = ∑_Φ(g ⇒ g′) where g ⇒ g′ is a fuzzy rule on measurement space. That is, the antecedent g(z) and consequent g′(z) are fuzzy membership functions on measurement space. Then it can be shown that the corresponding generalized likelihood function is

\begin{matrix} f_{k + 1} (g \Rightarrow g' | x) = Pr (h (x) \in \sum_{Φ} (g \Rightarrow g')) = (g \land g') (h (x)) + \frac{1}{2} (1 - g (h (x))) & (16.15) \end{matrix}

$\begin{matrix} f_{k + 1} (g \Rightarrow g' | x) = Pr (h (x) \in \sum_{Φ} (g \Rightarrow g')) = (g \land g') (h (x)) + \frac{1}{2} (1 - g (h (x))) & (16.15) \end{matrix}$

16.3.4.2 Bayesian Unification of UGA Measurement Fusion

A major feature of UGA measurements is that many familiar expert-system techniques for fusing evidence can be rigorously unified within a Bayesian paradigm. For this unification, we assume that the primary goal of level 1 information fusion is to determine state-estimates and estimates of the uncertainty in those estimates. No matter how complex or disparate various kinds of information might be, ultimately they must be reduced to summary information—state-estimates and, typically, covariances. This reduction itself constitutes a very lossy form of data compression.

Consequently, the unification problem can be restated as follows: what fusion techniques lose no estimation-relevant information? In a Bayesian formalism, this can be restated as: what fusion techniques are Bayes-invariant? That is, what fusion techniques leave posterior distributions unchanged?

Consider fuzzy measurements first. Let g(z) and g′(z) be fuzzy membership functions on measurement space. Then, in general, their joint likelihood is

\begin{matrix} f_{k + 1} (g, g' | x) = Pr (h (x) \in \sum_{A} (g') \cap \sum_{A} (g')) = (g \land g') (h (x)) = f_{k + 1} (g \land g' | x) & (16.16) \end{matrix}

$\begin{matrix} f_{k + 1} (g, g' | x) = Pr (h (x) \in \sum_{A} (g') \cap \sum_{A} (g')) = (g \land g') (h (x)) = f_{k + 1} (g \land g' | x) & (16.16) \end{matrix}$

Consequently, the posterior distribution jointly conditioned on g and g′ is

\begin{matrix} f_{k + 1} (x| g, g') = f_{k + 1} (x| g \land g') & (16.17) \end{matrix}

$\begin{matrix} f_{k + 1} (x| g, g') = f_{k + 1} (x| g \land g') & (16.17) \end{matrix}$

That is, fusion of the fuzzy measurements g and g′ using Zadeh conjunction yields the same posterior distribution—and thus the same estimates of target state and error—as is obtained using Bayes’ rule alone. (This result can be generalized to more general copula fuzzy logics.)

Similar results can be obtained for Dempster–Shafer measurements and for fuzzy rules. On the one hand, let o and o′ be Dempster–Shafer measurements. Then it can be shown that

\begin{matrix} f_{k + 1} (x| o, o') = f_{k + 1} (x | g * g') & (16.18) \end{matrix}

$\begin{matrix} f_{k + 1} (x| o, o') = f_{k + 1} (x | g * g') & (16.18) \end{matrix}$

where “*” denotes Dempster’s combination.

On the other hand, let g ⇒ g′ be a fuzzy rule on measurement space. Then it can be shown that

\begin{matrix} f_{k + 1} (x| g, g \Rightarrow g') = f_{k + 1} (x | g \land g') & (16.19) \end{matrix}

$\begin{matrix} f_{k + 1} (x| g, g \Rightarrow g') = f_{k + 1} (x | g \land g') & (16.19) \end{matrix}$

That is, firing the rule using its antecedent (left-hand side) yields the same result that would be expected from the inference due to rule-firing (right-hand side). Thus rule-firing is equivalent to a form of Bayes’ rule.

16.3.4.3 Bayes-Invariant Transformations of UGA Measurements

Much effort has been expended in the expert systems literature on devising conversions of one uncertainty representation scheme to another—fuzzy to probabilistic, Dempster–Shafer to probabilistic, to fuzzy, and so on. Such efforts have been hindered by the fact that uncertainty representation formalisms vary considerably in the degree of complexity of information that they encode.

Most obviously, 2^M − 1 numbers are required to specify a (crisp) basic mass assignment on a finite measurement space with M elements, whereas only M − 1 numbers are required to specify a probability distribution p(z). Consequently, any conversion of Dempster–Shafer measurements to probability distributions will result in loss of information.

A second issue is the fact that conversion from one uncertainty representation to another should be consistent with the data fusion methodologies intrinsic to these formalisms. For example, fusion of Dempster–Shafer measurements is commonly accomplished using Dempster’s combination B * B′. Fusion of fuzzy measurements g, g′, however, is usually accomplished using Zadeh’s fuzzy conjunction g ∧ g′. For Dempster–Shafer measurements to be consistently converted to fuzzy measurements, B → g_B, one should have g_B*B′ = g_B ∧ g_B′ in some sense.

As before, the path out of such quandaries is to assume that the primary goal of level 1 information fusion is to determine state-estimates and estimates of the uncertainty in those estimates. In a Bayesian formalism, the conversion problem can be restated as: what conversions between uncertainty representations are Bayes-invariant—that is, leave posterior distributions unchanged?

Given this, it is possible to determine Bayes-invariant transformations between various kinds of measurements. We cannot delve into this issue in any depth in this context. For example, it can be shown that a Bayes-invariant conversion of B → g_B of Dempster–Shafer measurements to fuzzy measurements is defined by

\begin{matrix} g_{B} (z) = \sum_{j = 1}^{m} b_{j} \cdot 1_{B_{i}} (z) & (16.20) \end{matrix}

$\begin{matrix} g_{B} (z) = \sum_{j = 1}^{m} b_{j} \cdot 1_{B_{i}} (z) & (16.20) \end{matrix}$

and where, now, fuzzy conjunction must be defined as (g ∧ g′)(z) = g(z) ⋅ g′(z). More details can be found in Ref. 4.

16.3.5 Ambiguously Generated Ambiguous Measurements

An AGA measurement is characterized by two things:

Modeling of the measurement itself involves ambiguity (because, for example, of human operator interpretation processes); and
The association of measurements with target states cannot be precisely specified by a state-to-measurement transform of the form z = h(x).

Typical applications that require AGA measurement modeling are those that involve not only human-mediated observations, as with UGA measurements, but also the construction of human-mediated model bases. Consider, for example, the hub/tire feature in the previous section. Suppose that it is not possible to define a function that assigns a hub/tire number h(ν) to every target type ν. Rather, our understanding of at least some targets may be incomplete, so that we cannot say for sure that type ν has a specific number of hubs/tires. In this case, it is not only the human-mediated observation that must be modeled; one must rely on the expertise of experts to construct a model of the feature. This introduces a human interpretation process, and the ambiguities associated with it necessitate more complicated models in place of h(ν). One must also construct a random set model ∑_ν using the expert information that has been supplied.

For example, suppose that target type T1 is believed to have n = 5 hubs/tires, but we are not quite sure about this fact. So, on the basis of expert knowledge we construct a random subset ∑_ν of the measurement space {1, …, 8} as follows: ∑_ν = {5} with probability .6; ∑_ν = {4, 5, 6} with probability .3; and ∑_ν = {1, 2, 3, 4, 5, 6, 7, 8} with probability .1.

16.3.5.1 Generalized Likelihood Functions for AGA Measurements

The sensor-transform model h(x) for UGA measurements is unambiguous in the sense that the value of h(x) is known precisely—there is no ambiguity in associating the measurement value h(x) with the state value x. However, this assumption is not valid in general.

In mathematical terms, h(x) may be known only up to containment within some constraint h(x) ∈ H. In this case, h(x) is set valued: h(x) ∈ H_0,x. Or, we may need to specify a nested sequence of constraints H_0,x ⊆ H_1,x ⊆ … ⊆ H_e,x with associated probabilities h_i,x that H_i,x is the correct constraint. If we define the random subset ∑_x of measurement space by Pr(∑_x = H_i,x) = h_i,x then h(x) is random set valued: h(x) = ∑_x. In general, ∑_x can be any random closed subset of measurement space such that Pr(∑_x = ∅) = 0.

The measurement model h(x) ∈ Θ for UGA measurements is generalized to: ∑_x ∩ Θ ≠ ∅. That is, the generalized measurement Θ matches the generalized model ∑_x if it does not directly contradict it. The generalized likelihood function

\begin{matrix} f_{k + 1} (Θ | x) = Pr (\sum_{x} \cap Θ \neq \emptyset) & (16.21) \end{matrix}

$\begin{matrix} f_{k + 1} (Θ | x) = Pr (\sum_{x} \cap Θ \neq \emptyset) & (16.21) \end{matrix}$

is a measure of the degree of matching between Θ and ∑_x. Specific formulas can be computed for specific situations. Suppose, for example, that both the generalized measurement and the generalized models are fuzzy: Θ = ∑_A(g) and ∑_x = ∑_A(h_x) where g(z) and h_x(z) are fuzzy membership functions on measurement space. Then it can be shown that the corresponding generalized likelihood function is

\begin{matrix} f_{k + 1} (g | x) = max_{z} min{g (z), h_{x} (z)} & (16.22) \end{matrix}

$\begin{matrix} f_{k + 1} (g | x) = max_{z} min{g (z), h_{x} (z)} & (16.22) \end{matrix}$

Specific formulas can be found for the generalized likelihood functions for other AGA measurement types, for example, Dempster–Shafer measurements.

16.3.6 Ambiguously Generated Unambiguous Measurements

AGU measurements differ from UGA and AGA measurements in that:

The measurement z itself can be precisely specified; but
Its corresponding likelihood function f_k+1(z|x) cannot.

Applications that require AGU techniques are those in which statistically uncharacterizable real-world variations make the specification of conventional likelihoods, f_k+1(z|x), difficult or impossible. Typical applications include ground-target identification using SAR intensity-pixel images. SAR images of ground targets, for example, can vary greatly because of the following phenomenologies: dents; wet mud; irregular placement of standard equipment (e.g., grenade launchers); placement of nonstandard equipment; turret articulation for a tank; and so on. One has no choice but to develop techniques that allow one to hedge against unknowable real-world uncertainties.

If the uncertainty in f_k+1(z|x) is due to uncertainty in the sensor state-to-measurement transform model h(x), then AGA techniques could be directly applied. In this case, h(x) is replaced by a random set model ∑_x and we just set Θ = {z}. Applying Equation 16.21 (the definition of an AGA generalized likelihood) we get

f_{k + 1} (Θ | x) = Pr (\sum_{x} \cap Θ \neq \emptyset) = Pr (\sum_{x} \cap {z} \neq \emptyset) = Pr (z \in \sum_{x})

$f_{k + 1} (Θ | x) = Pr (\sum_{x} \cap Θ \neq \emptyset) = Pr (\sum_{x} \cap {z} \neq \emptyset) = Pr (z \in \sum_{x})$

Unfortunately, it is often difficult to devise meaningful, practical models of the form ∑_x. In such cases, a different random set technique must be used: the random error bar. We cannot delve into this further in this context. More details can be found in Ref. 4.

16.3.7 Generalized State-Estimates

In the most general sense, measurements are opinions about, or interpretations of, what is being observed. Some data sources do not supply measurements. Instead, they supply a posteriori opinions about target state, based on measurements which they do not pass on to us. The most familiar example is a radar which feeds its measurements into a Kalman tracker and then passes on only the time-evolving track data—or, equivalently, the time-evolving posterior distribution.

Such information is far more complex and difficult to process than measurements. We have shown how to deal with state-estimates that have a specific form: they are Dempster–Shafer. The simplest kind of nonstatistical ambiguity in a state-estimate is imprecision. The data source cannot determine the value of a state, x, precisely but, rather, only within containment in some measurement-set S. The state-estimate could also be fuzzy. That is, the data source specifies a nested sequence S₁ ⊆ … ⊆ S_e of alternative constraints with associated weights s₁, …, s_e. Finally, the data source may not supply nested hypotheses—that is, the state-estimate is Dempster–Shafer. Such data can be represented in random set form: Pr(Γ = S_i) = s_i.

Using our UGA measurement theory of Section 16.3.4, we have shown how to represent such state-estimates as generalized likelihood functions. We have also shown that the modified Dempster’s combination of Fixsen and Mahler⁴⁵ is equivalent to Bayes’ rule, in the sense that using it to fuse Dempster–Shafer state-estimates produces the same posterior distributions as produced by fusing these state-estimates using Bayes’ rule alone. More details can be found in Ref. 4.

16.3.8 Unified Single-Target Multisource Integration

We have shown how to model nontraditional forms of data using generalized measurements, generalized measurement models, and generalized likelihood functions. As an immediate consequence, it follows that the recursive Bayes filter of Section 16.2 can be used to fuse multisource measurements and accumulate them over time.

That is, suppose that the measurement stream consists of a time sequence of generalized measurements: Z^k: Θ₁, …, Θ_k. Then Equations 16.4 and 16.5 can be used to fuse and accumulate this data in the usual manner:

\begin{matrix} f_{k + 1 | k} (x| Z^{k}) = \int f_{k + 1 | k} (x| y) \cdot f_{k | k} (y) d y & (16.23) \end{matrix}

$\begin{matrix} f_{k + 1 | k} (x| Z^{k}) = \int f_{k + 1 | k} (x| y) \cdot f_{k | k} (y) d y & (16.23) \end{matrix}$

\begin{matrix} f_{k + 1 | k + 1} (x| Z^{k + 1}) = \frac{f_{k + 1} (Θ_{k + 1} | x) \cdot f_{k + 1 | k} (x| Z^{k})}{f_{k + 1} (Θ_{k + 1} | Z^{k})} & (16.24) \end{matrix}

$\begin{matrix} f_{k + 1 | k + 1} (x| Z^{k + 1}) = \frac{f_{k + 1} (Θ_{k + 1} | x) \cdot f_{k + 1 | k} (x| Z^{k})}{f_{k + 1} (Θ_{k + 1} | Z^{k})} & (16.24) \end{matrix}$

where

f_{k + 1} (Θ | Z^{k}) = \int f_{k + 1 | k} (Θ | x) \cdot f_{k + 1 | k} (x| Z^{k}) d x

$f_{k + 1} (Θ | Z^{k}) = \int f_{k + 1 | k} (Θ | x) \cdot f_{k + 1 | k} (x| Z^{k}) d x$

16.4 Multisource-Multitarget Calculus

This section introduces the mathematical core of FISST—the FISST multitarget integral and differential calculus. In particular, it shows that the belief-mass function of a multitarget sensor model or a multitarget motion model plays the same role in multisensor-multitarget statistics that the probability-mass function plays in single-target statistics.

The integral and derivative are the mathematical basis of conventional single-sensor, single-target statistics. We will show that the basis of multisensor-multitarget statistics is a multitarget integral and a multitarget derivative. We will show that, using the FISST calculus,

True multisensor-multitarget likelihood functions can be constructed from the measurement models of the individual sensors, and
True multitarget Markov transition densities can be constructed from the motion models of the individual targets.

Section 16.4.1 introduces the concept of a random finite set. The next three sections introduce the fundamental statistical descriptors of a multitarget system: the multitarget probability density function (Section 16.4.2), the belief-mass function (Section 16.4.3), and the probability generating functional (p.g.fl.; Section 16.4.4). The multitarget set integral is also introduced in Section 16.4.2. The foundations of multitarget calculus—the functional derivative of a p.g.fl. and its special case, the set derivative of a belief-mass function—are described in Section 16.4.5. Section 16.4.6 lists some of the basic theorems of the multitarget calculus: the fundamental theorem of multitarget calculus; the Radon-Nikodým theorem for multitarget calculus; and the fundamental convolution formula for multitarget probability density functions. Section 16.5.7 lists a few basic rules for the set derivative.

16.4.1 Random Finite Sets

Most readers may already be familiar with the following types of random variables:

Random integer, J. A random variable that draws its instantiations J = j from some subset of integers (all integers, nonnegative integers, and so on).
Random number, A. A random variable that draws its instantiations A = a from some subset of real numbers (all reals, nonnegative reals, the numbers in [0, 1], and so on).
Random vector, Y. A random variable that draws its instantiations Y = y from some subset of a Euclidean vector space.

It is less likely that the reader is familiar with one of the fundamental statistical concepts of this chapter:

Random finite subset, Ψ: A random variable that draws its instantiations Ψ = Y from the hyperspace of all finite subsets of some underlying space (e.g., finite subsets of single-target state space or of single-sensor measurement space).

16.4.2 Multiobject Density Functions and Set Integrals

Let f(Y) be a function of a finite-set variable Y. That is, f(Y) has the form

f(∅) = probability that no targets are present

f({y₁, …,y_n}) = probability density that n objects y₁, …, y_n are present

Also, it is assumed that the units of measurement of f(Y) are u^−|Y| if u is the unit of measurement of an individual state y. Then the set integral of f(Y) in a region S is¹³

\begin{matrix} \int_{S} f (Y) δ Y = f (\emptyset) + \sum_{j = 1}^{\infty} \frac{1}{j!} \int_{S \times \cdot \cdot \cdot \times S} f ({y_{1}, ..., y_{j}}) {d y}_{1} ... {d y}_{j} & (16.25) \end{matrix}

$\begin{matrix} \int_{S} f (Y) δ Y = f (\emptyset) + \sum_{j = 1}^{\infty} \frac{1}{j!} \int_{S \times \cdot \cdot \cdot \times S} f ({y_{1}, ..., y_{j}}) {d y}_{1} ... {d y}_{j} & (16.25) \end{matrix}$

16.4.3 Belief-Mass Functions

Just as the statistical behavior of a random vector Y is characterized by its probability-mass function p_Y(S) = Pr(Y∈S), so the statistical behavior of a random finite set Ψ is characterized by its belief-mass function:¹

β_{Ψ} (S) = Pr (Ψ \subseteq S)

$β_{Ψ} (S) = Pr (Ψ \subseteq S)$

For example, if Ψ is a random observation-set, then the belief mass β(S|X) = Pr(Ψ ⊆ S|X) is the total probability that all observations in a sensor (or multisensor) scan will be found in any given region S, if targets have multitarget state X.

As a specific example, suppose that Ψ = {Y}, where Y is a random vector. Then the belief-mass function of Ψ is just the probability-mass function of Y:

β_{Ψ} (S) = Pr (Ψ \subseteq S) = Pr (Y \in S) = p_{Y} (S)

$β_{Ψ} (S) = Pr (Ψ \subseteq S) = Pr (Y \in S) = p_{Y} (S)$

16.4.4 Probability Generating Functionals

Let Ψ be a random finite set with density function f_Ψ(Y). For any finite subset Y and any real-valued function h(y) define h^Y = 1 if Y = ∅ and, if Y = {y₁, …, y_n} with y₁, …, y_n distinct,

\begin{matrix} h^{Y} = h_{(y_{1})} \cdot \cdot \cdot h (y_{n}) & (16.26) \end{matrix}

$\begin{matrix} h^{Y} = h_{(y_{1})} \cdot \cdot \cdot h (y_{n}) & (16.26) \end{matrix}$

Then the p.g.fl.⁴ of Ψ is defined as the set integral:

\begin{matrix} G_{Ψ} {h] = \int h^{Y} \cdot f_{Ψ} (Y) δ Y & (16.27) \end{matrix}

$\begin{matrix} G_{Ψ} {h] = \int h^{Y} \cdot f_{Ψ} (Y) δ Y & (16.27) \end{matrix}$

In particular, if h(y) = 1_S(y) is the indicator function of the set S, then G_Ψ[1_S] = β_Ψ(S).

The intuitive meaning of the p.g.fl. is as follows. If 0 ≤ h(y) ≤ 1, then h can be interpreted as the fuzzy membership function of a fuzzy set. The p.g.fl. G_Ψ[h] can be shown to be the probability that Ψ is contained in the fuzzy set represented by h(y). That is, G_Ψ[h] generalizes the concept of a belief-mass function β_Ψ(S) from crisp subsets to fuzzy subsets.⁴

16.4.5 Functional Derivatives and Set Derivatives

The gradient derivative (a.k.a. directional or Frechét derivative) of a real-valued function G(x) in the direction of a vector w is

\begin{matrix} \frac{\partial G}{\partial w} (x) = \lim_{ɛ \to 0} \frac{G (x + ɛ \cdot w) - G (x)}{ɛ} & (16.28) \end{matrix}

$\begin{matrix} \frac{\partial G}{\partial w} (x) = \lim_{ɛ \to 0} \frac{G (x + ɛ \cdot w) - G (x)}{ɛ} & (16.28) \end{matrix}$

where for each x the function w → (∂G/∂w)(x) is linear and continuous. Thus,

\frac{\partial G}{\partial w} (x) = w_{1} \frac{\partial G}{\partial w_{1}} (x) + \cdot \cdot \cdot + w_{N} \frac{\partial G}{\partial w_{N}} (x)

$\frac{\partial G}{\partial w} (x) = w_{1} \frac{\partial G}{\partial w_{1}} (x) + \cdot \cdot \cdot + w_{N} \frac{\partial G}{\partial w_{N}} (x)$

for all w = (w₁, …, w_N), where the derivatives on the right are ordinary partial derivatives. Likewise, the gradient derivative of a p.g.fl. G[h] in the direction of the function g is

\begin{matrix} \frac{\partial G}{\partial g} [h] = \lim_{ɛ \to 0} \frac{G (h + ɛ \cdot g) - G [h]}{ɛ} & (16.29) \end{matrix}

$\begin{matrix} \frac{\partial G}{\partial g} [h] = \lim_{ɛ \to 0} \frac{G (h + ɛ \cdot g) - G [h]}{ɛ} & (16.29) \end{matrix}$

where for each h the functional g → (∂G/∂g)[h] is linear and continuous. Gradient derivatives obey the usual turn the crank rules of undergraduate calculus.

In physics, gradient derivatives with g = δ_x are called functional derivatives.⁴⁶ Using the simplified version of this physics notation employed in FISST, we define the functional derivatives of a p.g.fl. G[h] to be^{4,47, 48 and 49}

\begin{matrix} \begin{matrix} \begin{matrix} \frac{δ G}{δ \emptyset} [h] = \frac{δ^{0} G}{δ x^{0}} [h] = G [h] \end{matrix} & \frac{δ G}{δ x} [h] \end{matrix} = \frac{\partial G}{\partial δ_{x}} [h] & (16.30) \end{matrix}

$\begin{matrix} \begin{matrix} \begin{matrix} \frac{δ G}{δ \emptyset} [h] = \frac{δ^{0} G}{δ x^{0}} [h] = G [h] \end{matrix} & \frac{δ G}{δ x} [h] \end{matrix} = \frac{\partial G}{\partial δ_{x}} [h] & (16.30) \end{matrix}$

\begin{matrix} \frac{δ G}{δ X} [h] = \frac{δ^{n} G}{δ_{x_{1}} \cdot \cdot \cdot δ_{x_{n}}} [h] = \frac{\partial^{n} G}{\partial δ_{x_{1}} \cdot \cdot \cdot {\partial δ}_{x_{n}}} [h] & (16.31) \end{matrix}

$\begin{matrix} \frac{δ G}{δ X} [h] = \frac{δ^{n} G}{δ_{x_{1}} \cdot \cdot \cdot δ_{x_{n}}} [h] = \frac{\partial^{n} G}{\partial δ_{x_{1}} \cdot \cdot \cdot {\partial δ}_{x_{n}}} [h] & (16.31) \end{matrix}$

where X = {x₁, …, x_n} with x₁, …, x_n distinct.

The set derivative of a belief-mass function β(S) of a finite random set Ξ is a functional derivative of G[h] with h = 1_S:

\begin{matrix} \begin{matrix} \begin{matrix} \frac{{δ β}_{Ξ}}{δ \emptyset} (S) = \frac{δ G}{δ \emptyset} [1_{s}] = G_{Ξ} [1_{s}] \end{matrix} & \frac{{δ β}_{Ξ}}{δ x} (S) = \frac{{δ G}_{Ξ}}{δ x} [1_{s}] \end{matrix} & (16.32) \end{matrix}

$\begin{matrix} \begin{matrix} \begin{matrix} \frac{{δ β}_{Ξ}}{δ \emptyset} (S) = \frac{δ G}{δ \emptyset} [1_{s}] = G_{Ξ} [1_{s}] \end{matrix} & \frac{{δ β}_{Ξ}}{δ x} (S) = \frac{{δ G}_{Ξ}}{δ x} [1_{s}] \end{matrix} & (16.32) \end{matrix}$

\begin{matrix} \frac{{δ β}_{Ξ}}{δ X} (S) = \frac{δ^{n} G_{Ξ}}{δ x_{1} \cdot \cdot \cdot δ x_{n}} [1_{s}] & (16.33) \end{matrix}

$\begin{matrix} \frac{{δ β}_{Ξ}}{δ X} (S) = \frac{δ^{n} G_{Ξ}}{δ x_{1} \cdot \cdot \cdot δ x_{n}} [1_{s}] & (16.33) \end{matrix}$

for X = {x₁, …, x_n} with x₁, …, x_n distinct.

An alternative way of defining the set derivative is as follows. Let E_x be a very small neighborhood of the point x with (hyper)volume ε = λ(E_x). Then

δ_{x} (y) ≅ ɛ^{- 1} 1_{E_{x}} (y)

$δ_{x} (y) ≅ ɛ^{- 1} 1_{E_{x}} (y)$

and so,

G_{Ξ} [1_{S} + ɛ δ_{x}] ≅ G_{Ξ} [1_{S} + 1_{E_{x}}] = G_{Ξ} [1_{S \cup E_{x}}] = β_{Ξ} (S \cup E_{x})

$G_{Ξ} [1_{S} + ɛ δ_{x}] ≅ G_{Ξ} [1_{S} + 1_{E_{x}}] = G_{Ξ} [1_{S \cup E_{x}}] = β_{Ξ} (S \cup E_{x})$

where the second of these equations results from assuming that S and E_x are disjoint. Consequently, it follows that the set derivative can be defined directly from belief-mass functions as follows:

\begin{matrix} \frac{{δ β}_{Ξ}}{δ x} (S) = \lim_{λ (E_{x}) \to 0} \frac{β_{Ξ} (S \cup E_{x}) - β_{Ξ} (S)}{λ (E_{x})} & (16.34) \end{matrix}

$\begin{matrix} \frac{{δ β}_{Ξ}}{δ x} (S) = \lim_{λ (E_{x}) \to 0} \frac{β_{Ξ} (S \cup E_{x}) - β_{Ξ} (S)}{λ (E_{x})} & (16.34) \end{matrix}$

16.4.6 Key Theorems of Multitarget Calculus

The fundamental theorem of undergraduate calculus states that the integral and derivative are essentially inverse operations:

\begin{matrix} \int_{a}^{b} \frac{d f}{d x} (y) d y = f (b) - f (a) & \frac{d}{d x} \int_{a}^{x} f (y) d y \end{matrix} = f (x)

$\begin{matrix} \int_{a}^{b} \frac{d f}{d x} (y) d y = f (b) - f (a) & \frac{d}{d x} \int_{a}^{x} f (y) d y \end{matrix} = f (x)$

Another basic formula of calculus, the Radon-Nikodým theorem, states that the probability-mass function of a random vector X can be written as an integral

Pr (X \in S) = \int_{S} f_{X} (x) d x

$Pr (X \in S) = \int_{S} f_{X} (x) d x$

where f_X(x) is the probability density function of X. This section presents analogous theorems for the multitarget calculus.

16.4.6.1 Fundamental Theorem of Multitarget Calculus

The set integral and derivative are inverse to each other:

\begin{matrix} Pr (Ξ \subseteq S) = β_{Ξ} (S) = \int_{S} \frac{{δ β}_{Ξ}}{δ X} (\emptyset) δ X & {\begin{matrix} f (X) = [\frac{δ}{δ X} \int_{S} f (Y) δ Y] \end{matrix}}_{S = \emptyset} \end{matrix}

$\begin{matrix} Pr (Ξ \subseteq S) = β_{Ξ} (S) = \int_{S} \frac{{δ β}_{Ξ}}{δ X} (\emptyset) δ X & {\begin{matrix} f (X) = [\frac{δ}{δ X} \int_{S} f (Y) δ Y] \end{matrix}}_{S = \emptyset} \end{matrix}$

16.4.6.2 Radon-Nikodým Theorem for Multitarget Calculus

The functional derivative and set integral are related by the formula:

\frac{δ G}{δ X} [h] = \int h^{Y} \cdot f (X \cup Y) δ Y

$\frac{δ G}{δ X} [h] = \int h^{Y} \cdot f (X \cup Y) δ Y$

16.4.6.3 Fundamental Convolution Formula for Multitarget Calculus

Let Ξ₁, …, Ξ_n be statistically independent random finite subsets and let Ξ = Ξ₁ ∪ … ∪ Ξ_n be their union. Then the probability density function of Ξ is

f_{Ξ} (X) = \sum_{W_{1} \cup ... \cup W_{n} = X} f_{Ξ_{1}} (W_{1}) \cdot \cdot \cdot f_{Ξ_{n} (W_{n})}

$f_{Ξ} (X) = \sum_{W_{1} \cup ... \cup W_{n} = X} f_{Ξ_{1}} (W_{1}) \cdot \cdot \cdot f_{Ξ_{n} (W_{n})}$

where the summation is taken over all mutually disjoint subsets W₁, …, W_n of X such that W₁ ∪ … ∪ W_n = X.

16.4.7 Basic Differentiation Rules

Practitioners usually find it possible to apply ordinary Newtonian differential and integral calculus by applying the turn-the-crank rules they learned as undergraduates. Similar turn-the-crank rules exist for the FISST calculus. The simplest of these are

\begin{matrix} \frac{β}{δ Y} [a_{1} β_{1} (S) + a_{2} β_{2} (X)] = a_{1} \frac{{δ β}_{1}}{δ Y} (S) + a_{2} \frac{{δ β}_{2}}{δ Y} (S) & (sum rule) \\ \frac{\begin{matrix} δ \end{matrix}}{δ y} [β_{1} (S) \cdot β_{2} (X)] = \frac{{δ β}_{1}}{δ y} (S) \cdot β_{2} (S) + β_{1} (S) \cdot \frac{{δ β}_{2}}{δ y} (S) \\ \frac{δ}{δ y} [β_{1} (S) \cdot β_{2} (X)] = \sum_{W \subseteq Z} \frac{{δ β}_{1}}{δ W} (S) \cdot \frac{{δ β}_{2}}{δ (Z - W)} (S) & productrules \\ \frac{δ}{δ y} f (β_{1} (S), ..., β_{n} (S)) = \sum_{i = 1}^{n} \frac{\partial f}{{\partial x}_{i}} (β_{1} (S), ..., β_{n} (S)) \cdot \frac{δ β_{i}}{δ y} (S) & chain rule \end{matrix}

$\begin{matrix} \frac{β}{δ Y} [a_{1} β_{1} (S) + a_{2} β_{2} (X)] = a_{1} \frac{{δ β}_{1}}{δ Y} (S) + a_{2} \frac{{δ β}_{2}}{δ Y} (S) & (sum rule) \\ \frac{\begin{matrix} δ \end{matrix}}{δ y} [β_{1} (S) \cdot β_{2} (X)] = \frac{{δ β}_{1}}{δ y} (S) \cdot β_{2} (S) + β_{1} (S) \cdot \frac{{δ β}_{2}}{δ y} (S) \\ \frac{δ}{δ y} [β_{1} (S) \cdot β_{2} (X)] = \sum_{W \subseteq Z} \frac{{δ β}_{1}}{δ W} (S) \cdot \frac{{δ β}_{2}}{δ (Z - W)} (S) & productrules \\ \frac{δ}{δ y} f (β_{1} (S), ..., β_{n} (S)) = \sum_{i = 1}^{n} \frac{\partial f}{{\partial x}_{i}} (β_{1} (S), ..., β_{n} (S)) \cdot \frac{δ β_{i}}{δ y} (S) & chain rule \end{matrix}$

16.5 Multitarget Likelihood Functions

In the single-target case, probabilistic approaches to tracking and identification (and Bayesian approaches in particular) depend on the ability to construct sensor models together with the likelihood functions for those models. This section shows how to construct multisource-multitarget measurement models and their corresponding true multitarget likelihood functions. The crucial result is as follows:

The provably true likelihood function f_k+1(Z|X) of a multisensor-multitarget problem is a set derivative of the belief-mass function β_k+1(S|X) = Pr(∑ ⊆ S|X) of the corresponding sensor (or multisensor) model

\begin{matrix} f_{k + 1} (Z | X) = \frac{δ β_{k + 1}}{δ Z_{k + 1}} (\emptyset | X) & (16.35) \end{matrix}

$\begin{matrix} f_{k + 1} (Z | X) = \frac{δ β_{k + 1}}{δ Z_{k + 1}} (\emptyset | X) & (16.35) \end{matrix}$

16.5.1 Multitarget Measurement Models

The following sections illustrate the process of constructing multitarget measurement models for the following successively more realistic situations: (1) no missed detections and no false alarms; (2) missed detections and no false alarms; (3) missed detections and false alarms; and (4) the multisensor case.

16.5.1.1 Case I: No Missed Detections, No False Alarms

Suppose that two targets with states x₁ and x₂ are interrogated by a single sensor that generates observations of the form Z = h(x) + W, where W is a random noise vector with density f_w(w). Assume also that there are no missed detections or false alarms, and that observations within a scan are independent. Then, the multitarget measurement is the randomly varying two-element observation-set

\begin{matrix} \sum = {Z_{1}, Z_{2}} = {h (x_{1}) + W_{1}, h (x_{2}) + W_{2}} & (16.36) \end{matrix}

$\begin{matrix} \sum = {Z_{1}, Z_{2}} = {h (x_{1}) + W_{1}, h (x_{2}) + W_{2}} & (16.36) \end{matrix}$

where W₁, W₂ are independent random vectors with density f_w(w).

Note: We assume that individual targets produce unique observations only for the sake of clarity. Clearly, we could just as easily produce models for other kinds of sensors—for example, sensors that detect only superpositions of the signals produced by multiple targets. One such measurement model is

\sum = {Z_{1}} = {h_{1} (x_{1}) + h_{2} (x_{2}) + W}

$\sum = {Z_{1}} = {h_{1} (x_{1}) + h_{2} (x_{2}) + W}$

16.5.1.2 Case II: Missed Detections

Suppose that the sensor has a probability of detection p_D < 1. In this case, observations can have not only the form Z = {z₁, z₂} (two detections), but also Z = {z} (single detection) or Z = ∅ (missed detection). The more complex observation model ∑ = T₁ ∪ T₂ is needed. It has observation sets T₁, T₂ with the following properties: (a) T_i = ∅ with probability 1 – p_D and (b) T_i is nonempty with probability p_D, in which case, T_i = {Z_i}. If the sensor has a specific field of view, then p_D will be a function of both the target state and the state of the sensor.

16.5.1.3 Case III: Missed Detections and False Alarms

Suppose the sensor has probability of false alarm p_FA ≠ 0. Then we need an observation model of the form

\sum = \overset{target}{T} \cup \overset{clutter}{C} = T_{1} \cup T_{2} \cup C

$\sum = \overset{target}{T} \cup \overset{clutter}{C} = T_{1} \cup T_{2} \cup C$

where C models false alarms and (possibly state-dependent) point clutter. As a simple example, C could have the form C = C₁ ∪ … ∪ C_m, where each C_j is a clutter generator—which implies that there is a probability, p_FA, that C_j will be nonempty (i.e., generator of a clutter-observation). In this case, C = {C_i} where C_i is some random noise vector with density f_Ci(z). (Notice that ∑ = T₁ ∪ C models the single-target case with false alarms or clutter.)

16.5.1.4 Case IV: Multiple Sensors

In this case, observations will have the form z^[s] = (z, s) where the integer tag s identifies which sensor originated the measurement. A two-sensor multitarget measurement will have the form ∑ = ∑^[1] ∪ ∑^[2] where ∑^[s] for s = 1, 2 is the random multitarget measurement-set collected by the sensor with tag s and can have any of the forms previously described.

16.5.2 Belief-Mass Functions of Multitarget Sensor Models

In this section, we illustrate how multitarget measurement models are transformed into belief-mass functions. The simplest single-sensor measurement model has the form ∑ = {Z} where Z is a random measurement-vector. In Section 16.4.3, we have shown that the belief-mass function of ∑ is identical to the probability-mass function of Z.

The next most complicated model is that of a single-target with missed detections, ∑ = T₁. Its corresponding belief-mass function is

\begin{matrix} β (S | x) & = & Pr (T_{1} \subseteq S) = Pr (T_{1} = \emptyset) + Pr (T_{1} \neq \emptyset, Z \in S) \\ = & Pr (T_{1} - \emptyset) + Pr (T_{1} \neq \emptyset) \cdot Pr (Z \in S | T_{1} \neq \emptyset) \\ = & 1 - p_{D} + p_{D} \cdot p_{Z} (S | x) & (16.37) \end{matrix}

$\begin{matrix} β (S | x) & = & Pr (T_{1} \subseteq S) = Pr (T_{1} = \emptyset) + Pr (T_{1} \neq \emptyset, Z \in S) \\ = & Pr (T_{1} - \emptyset) + Pr (T_{1} \neq \emptyset) \cdot Pr (Z \in S | T_{1} \neq \emptyset) \\ = & 1 - p_{D} + p_{D} \cdot p_{Z} (S | x) & (16.37) \end{matrix}$

The two-target missed-detection model has the form ∑ = T₁ ∪ T₂. Its belief-mass function is

\begin{matrix} β (S | X) & = & Pr (T_{1} \subseteq S) \cdot Pr (T_{2} \subseteq S) \\ = & (1 - p_{D} + p_{D} \cdot p (S | x_{1})) \cdot (1 - p_{D} + p_{D} \cdot p (S | x_{2})) \end{matrix}

$\begin{matrix} β (S | X) & = & Pr (T_{1} \subseteq S) \cdot Pr (T_{2} \subseteq S) \\ = & (1 - p_{D} + p_{D} \cdot p (S | x_{1})) \cdot (1 - p_{D} + p_{D} \cdot p (S | x_{2})) \end{matrix}$

where p(S|x_i) = Pr(T_i ⊆ S|T_i ≠ ∅) and X = {x₁, x₂}. Setting p_D = 1 yields

\begin{matrix} β (S | X) = Pr (T_{1} \subseteq S) \cdot Pr (T_{2} \subseteq S) & (16.38) \end{matrix}

$\begin{matrix} β (S | X) = Pr (T_{1} \subseteq S) \cdot Pr (T_{2} \subseteq S) & (16.38) \end{matrix}$

This is the belief-mass function for the model ∑ = {Z₁, Z₂}.

Finally, suppose that two sensors with identifying tags s = 1, 2 collect observation sets ∑ = ∑^[1] ∪ ∑^[2]. The corresponding belief-mass function has the form

β (S^{[1]} \cup S^{[2]} | X) = Pr (\sum^{[1]} \subseteq S^{[1]}, \sum^{[2]} \subseteq S^{[2]})

$β (S^{[1]} \cup S^{[2]} | X) = Pr (\sum^{[1]} \subseteq S^{[1]}, \sum^{[2]} \subseteq S^{[2]})$

where S^[1], S^[2] are subsets of the measurement spaces of the respective sensors. If the two sensors are independent, then the belief-mass function has the form

β (S^{[1]} \cup S^{[2]} | X) = Pr (\sum^{[1]} \subseteq S^{[1]}) \cdot Pr(\sum^{[2]} \subseteq S^{[2]})

$β (S^{[1]} \cup S^{[2]} | X) = Pr (\sum^{[1]} \subseteq S^{[1]}) \cdot Pr(\sum^{[2]} \subseteq S^{[2]})$

16.5.3 Constructing True Multitarget Likelihood Functions

We apply the differentiation formulas of Section 16.4.7 to the belief-mass function β(S|X) = p(S|x₁) · p(S|x₂) of the measurement model ∑ = {Z₁, Z₂}, where X = {x₁, x₂}. Then

\begin{matrix} \frac{δ β}{δ z_{1}} (S | X) & = & \frac{δ}{{δ z}_{1}} β (S | X) = \frac{δ}{{δ z}_{1}} [p (S | x_{1}) \cdot p (S | x_{2})] \\ = & \frac{δ p}{{δ z}_{1}} (S | x_{1}) \cdot p (S | x_{2}) + p (S | x_{1}) \cdot \frac{δ p}{{δ z}_{1}} (S | x_{2}) \\ = & f (z_{1} | x_{1}) \cdot p (S | x_{2}) + p (S | x_{1}) \cdot f (z_{1} | x_{2}) \end{matrix}

$\begin{matrix} \frac{δ β}{δ z_{1}} (S | X) & = & \frac{δ}{{δ z}_{1}} β (S | X) = \frac{δ}{{δ z}_{1}} [p (S | x_{1}) \cdot p (S | x_{2})] \\ = & \frac{δ p}{{δ z}_{1}} (S | x_{1}) \cdot p (S | x_{2}) + p (S | x_{1}) \cdot \frac{δ p}{{δ z}_{1}} (S | x_{2}) \\ = & f (z_{1} | x_{1}) \cdot p (S | x_{2}) + p (S | x_{1}) \cdot f (z_{1} | x_{2}) \end{matrix}$

\begin{matrix} \frac{δ^{2} β}{δ z_{2} δ z_{1}} (S | X) & = & \frac{δ}{δ z_{2}} \frac{δ β}{{δ z}_{1}} (S | X) = f (z_{1} | x_{1}) \cdot \frac{δ p}{δ z_{2}} (S | x_{2}) + \frac{δ p}{{δ z}_{2}} (S | x_{1}) \cdot f (z_{1} | x_{2}) \\ = & f (z_{1} | x_{1}) \cdot f (z_{2} | x_{2}) + f (z_{2} | x_{1}) \cdot f (z_{1} | x_{2}) \end{matrix}

$\begin{matrix} \frac{δ^{2} β}{δ z_{2} δ z_{1}} (S | X) & = & \frac{δ}{δ z_{2}} \frac{δ β}{{δ z}_{1}} (S | X) = f (z_{1} | x_{1}) \cdot \frac{δ p}{δ z_{2}} (S | x_{2}) + \frac{δ p}{{δ z}_{2}} (S | x_{1}) \cdot f (z_{1} | x_{2}) \\ = & f (z_{1} | x_{1}) \cdot f (z_{2} | x_{2}) + f (z_{2} | x_{1}) \cdot f (z_{1} | x_{2}) \end{matrix}$

\frac{δ^{3} β}{{δ z}_{3} {δ z}_{2} {δ z}_{1}} (S | X) = \frac{δ}{δ z_{3}} \frac{δ^{2} β}{δ z_{2} {δ z}_{1}} (S | X) = 0

$\frac{δ^{3} β}{{δ z}_{3} {δ z}_{2} {δ z}_{1}} (S | X) = \frac{δ}{δ z_{3}} \frac{δ^{2} β}{δ z_{2} {δ z}_{1}} (S | X) = 0$

and the higher-order derivatives vanish identically. The multitarget likelihood is, therefore,

\begin{matrix} f (\emptyset | X) = \frac{δ β}{δ \emptyset} (\emptyset | X) = 0 \\ f ({z} | X) = \frac{δ β}{δ z} (\emptyset | X) = 0 & (16.39) \\ f ({z_{1}, z_{1}} | X) = \frac{δ^{2} β}{{δ z}_{2} {δ z}_{1}} (\emptyset | X) = f (z_{1} | x_{1}) \cdot f (z_{2} | x_{2}) + f (z_{2} | x_{1}) \cdot f (z_{1} | x_{2}) \end{matrix}

$\begin{matrix} f (\emptyset | X) = \frac{δ β}{δ \emptyset} (\emptyset | X) = 0 \\ f ({z} | X) = \frac{δ β}{δ z} (\emptyset | X) = 0 & (16.39) \\ f ({z_{1}, z_{1}} | X) = \frac{δ^{2} β}{{δ z}_{2} {δ z}_{1}} (\emptyset | X) = f (z_{1} | x_{1}) \cdot f (z_{2} | x_{2}) + f (z_{2} | x_{1}) \cdot f (z_{1} | x_{2}) \end{matrix}$

and where f(Z|X) = 0 identically if Z contains more than two elements. More general multitarget likelihoods can be computed similarly.^2,4

16.6 Multitarget Markov Densities

This section illustrates the process of constructing multitarget motion models and their corresponding true multitarget Markov densities. The construction of multitarget Markov densities strongly resembles that of multisensor-multitarget likelihood functions. The crucial result is as follows:

The provably true Markov transition density f_k+1|k(Y|X) of a multitarget problem is a set derivative of the belief-mass function β_k+1|k(S|X) of the corresponding multi-target motion model:

\begin{matrix} f_{k + 1 | k} (Y | X) = \frac{{δ β}_{k + 1 | k}}{δ Y} (\emptyset | X) & (16.40) \end{matrix}

$\begin{matrix} f_{k + 1 | k} (Y | X) = \frac{{δ β}_{k + 1 | k}}{δ Y} (\emptyset | X) & (16.40) \end{matrix}$

16.6.1 Multitarget Motion Models

This section considers the following increasingly more realistic situations: (1) multitarget motion models assuming that target number does not change; (2) multitarget motion models assuming that target number can decrease; and (3) multitarget motion models assuming that target number can decrease or increase.

16.6.1.1 Case I: Target Number Is Constant

Assume that the states of individual targets have the form x = (y, c), where y is the kinematic state and c the target type. Assume that each target type has an associated motion model X_c,k+1|k = Φ_c,k(x) + W_c,k. Let

X_{k + 1 | k} = Φ_{k} (x, W_{k}) = (Φ_{c, k} (x) + W_{c, k}, c)

$X_{k + 1 | k} = Φ_{k} (x, W_{k}) = (Φ_{c, k} (x) + W_{c, k}, c)$

To model a multitarget system in which two targets never enter or leave the scenario, the obvious multitarget extension of the single-target motion model would be Ξ_k+1|k = Φ_k(X,W_k), where Ξ_k+1|k is the randomly varying parameter set at time-step k + 1. That is, for the cases X = ∅, X = {x}, or X = {x₁, x₂}, respectively, the multitarget state transitions are

\begin{matrix} Ξ_{k + 1 | k} & = & \emptyset \\ Ξ_{k + 1 | k} & = & {X_{k + 1 | k} (x)} = {Φ_{k} (x, W_{k})} \\ Ξ_{k + 1 | k} & = & {X_{k + 1 | k} (x_{1}), X_{k + 1 | k} (x_{2})} = {Φ_{k} (x_{1}, W_{k, 1}), Φ_{k} (x_{2}, W_{k, 2})} \end{matrix}

$\begin{matrix} Ξ_{k + 1 | k} & = & \emptyset \\ Ξ_{k + 1 | k} & = & {X_{k + 1 | k} (x)} = {Φ_{k} (x, W_{k})} \\ Ξ_{k + 1 | k} & = & {X_{k + 1 | k} (x_{1}), X_{k + 1 | k} (x_{2})} = {Φ_{k} (x_{1}, W_{k, 1}), Φ_{k} (x_{2}, W_{k, 2})} \end{matrix}$

16.6.1.2 Case II: Target Number Can Decrease

Modeling scenarios in which target number can decrease but not increase is analogous to modeling multitarget observations with missed detections. Suppose that no more than two targets are possible, but that one or more of them can vanish from the scene. One possible motion model would be Ξ_k+1|k = Φ_k(X) where, for the cases X = ∅, X = {x}, or X = {x₁, x₂}, respectively,

\begin{matrix} \begin{matrix} Ξ_{k + 1 | k} = \emptyset & Ξ_{k + 1 | k} = T_{k} (x) \end{matrix} & Ξ_{k + 1 | k} \end{matrix} = T_{k} (x_{1}) \cup T_{k} (x_{2})

$\begin{matrix} \begin{matrix} Ξ_{k + 1 | k} = \emptyset & Ξ_{k + 1 | k} = T_{k} (x) \end{matrix} & Ξ_{k + 1 | k} \end{matrix} = T_{k} (x_{1}) \cup T_{k} (x_{2})$

Here T_k(x) is a state-set with the following properties: (a) T_k(x) ≠ ∅ with probability p_v, in which case T_k(x) = {X_k+1|k(x)}, and (b) T_k(x) = ∅ (i.e., target disappearance), with probability 1 − p_v. In other words, if no targets are present in the scene, this will continue to be the case. If, however, there is one target in the scene, then either this target will persist (with probability p_v) or it will vanish (with probability 1 − p_v). If there are two targets in the scene, then each will either persist or vanish in the same manner. In general, when n targets are present, one would model Ξ_k+1|k = T_k (x₁) ∪ … ∪T_k (x_n) ∪ B_k.

16.6.1.3 Case III: Target Number Can Increase and Decrease

Modeling scenarios in which target number can decrease or increase is analogous to modeling multitarget observations with missed detections and clutter. In this case, the general form of the model is

Ξ_{k + 1 | k} = T_{k} (x_{1}) \cup ... \cup T_{k} (x_{n}) \cup B_{k}

$Ξ_{k + 1 | k} = T_{k} (x_{1}) \cup ... \cup T_{k} (x_{n}) \cup B_{k}$

where B_k is the set of birth targets (i.e., targets that have entered the scene).

16.6.2 Belief-Mass Functions of Multitarget Motion Models

In single-target problems, the statistics of a motion model X_k+1|k = Φ_k(x, W_k) are described by the probability-mass function p_k+1|k(S|x) = Pr(X_k+1|k ∈ S), which is the probability that the target state will be found in the region S if it previously had state-vector x. Similarly, suppose that Ξ_k+1|k = Φ_k(X) is a multitarget motion model. The statistics of the finitely varying random state-set Ξ_k+1|k can be described by its belief-mass function:

β_{k + 1 | k} (S | X) = Pr (Ξ_{k + 1 | k} \subseteq S)

$β_{k + 1 | k} (S | X) = Pr (Ξ_{k + 1 | k} \subseteq S)$

This is the total probability of finding all targets in region S at time-step k + 1 if, in time-step k, they had multitarget state X = {x₁, …, x_n}.

For example, the belief-mass function for independent target motion with no appearing or disappearing targets is, for n = 2,

f_{k + 1 | k} (Y | X) = f_{k + 1 | k} (y_{1} | x_{1}) \cdot f_{k + 1 | k} (y_{2} | x_{2}) + f_{k + 1 | k} (y_{2} | x_{1}) \cdot f_{k + 1 | k} (y_{1} | x_{2})

$f_{k + 1 | k} (Y | X) = f_{k + 1 | k} (y_{1} | x_{1}) \cdot f_{k + 1 | k} (y_{2} | x_{2}) + f_{k + 1 | k} (y_{2} | x_{1}) \cdot f_{k + 1 | k} (y_{1} | x_{2})$

16.6.3 Constructing True Multitarget Markov Densities

Multitarget Markov densities^1,4,9,41 are constructed from multitarget motion models in much the same way that multisensor-multitarget likelihood functions are constructed from multisensor-multitarget measurement models. First, construct a multitarget motion model Ξ_k+1|k = Φ_k(X) from the underlying motion models of the individual targets. Second, build the corresponding belief-mass function β_k+1|k(S|X) = Pr(Ξ_k+1|k⊆ S|X). Finally, construct the multitarget Markov density f_k+1|k(Y|X) from the belief-mass function using the turn-the-crank formulas of the FISST calculus (Section 16.4.7).

For example, consider the case of independent two-target motion with no-target appearance or disappearance. This has the same form as the multitarget measurement model in Section 16.5.3. Consequently, its multitarget Markov density is³⁰

f_{k + 1 | k} (Y | X) = f_{k + 1 | k} (y_{1} | x_{1}) \cdot f_{k + 1 | k} (y_{2} | x_{2}) + f_{k + 1 | k} (y_{2} | x_{1}) \cdot f_{k + 1 | k} (y_{1} | x_{2})

$f_{k + 1 | k} (Y | X) = f_{k + 1 | k} (y_{1} | x_{1}) \cdot f_{k + 1 | k} (y_{2} | x_{2}) + f_{k + 1 | k} (y_{2} | x_{1}) \cdot f_{k + 1 | k} (y_{1} | x_{2})$

16.7 Multisource-Multitarget Bayes Filter

Thus far in this chapter we have described the multisensor-multitarget analogs of measurement and motion models, likelihood functions, Markov transition densities, probability-mass functions, and the integral and differential calculus. This section shows how these concepts combine together to produce a direct generalization of ordinary statistics to multitarget statistics, using the multitarget Bayes filter.

The multitarget Bayes filter is introduced in Section 16.7.1, and the issue of its initialization in Section 16.7.2. The failure of the classical Bayes-optimal state estimators in multitarget situations is described in Sections 16.7.3 and 16.7.4. The solution of this problem—the proper definition and verification of Bayes-optimal multitarget state estimators—is described in Section 16.7.5. The remaining two subsections summarize the concept of multitarget miss distance (Section 16.7.6) and of unified multitarget multisource integration.

16.7.1 Multisensor-Multitarget Filter Equations

Bayesian multitarget filtering is inherently nonlinear because multitarget likelihoods f_k+1(Z|X) are, in general, highly non-Gaussian even for a Gaussian sensor.² Therefore, multitarget nonlinear filtering is unavoidable if the goal is optimal-Bayes detection, tracking, localization, identification, and information fusion. The multitarget Bayes filter equations are

\begin{matrix} f_{k + 1 | k} (X | Z^{(k)}) = \int f_{k + 1 | k} (X | Y) \cdot f_{k | k} (Y | Z^{(k)}) δ Y & (16.41) \end{matrix}

$\begin{matrix} f_{k + 1 | k} (X | Z^{(k)}) = \int f_{k + 1 | k} (X | Y) \cdot f_{k | k} (Y | Z^{(k)}) δ Y & (16.41) \end{matrix}$

\begin{matrix} f_{k + 1 | k + 1} (X | Z^{(k + 1)}) = \frac{f_{k + 1} (Z_{k + 1} | X) f_{k + 1 | k} (X | Z^{(k)})}{f_{k + 1} (Z_{k + 1} | Z^{(k)})} & (16.42) \end{matrix}

$\begin{matrix} f_{k + 1 | k + 1} (X | Z^{(k + 1)}) = \frac{f_{k + 1} (Z_{k + 1} | X) f_{k + 1 | k} (X | Z^{(k)})}{f_{k + 1} (Z_{k + 1} | Z^{(k)})} & (16.42) \end{matrix}$

where

\begin{matrix} f_{k + 1} (Z | Z^{(k)}) = \int f_{k + 1 | k} (Z | X) \cdot f_{k + 1 | k} (X | Z^{(k)}) δ X & (16.43) \end{matrix}

$\begin{matrix} f_{k + 1} (Z | Z^{(k)}) = \int f_{k + 1 | k} (Z | X) \cdot f_{k + 1 | k} (X | Z^{(k)}) δ X & (16.43) \end{matrix}$

16.7.2 Initialization

The initial states of the targets in a multitarget system are specified by a multitarget initial distribution of the form f_0|0(X|Z⁽⁰⁾) = f₀(X),^1,6 where ∫f₀(X)δX = 1 and where the integral is a set integral. Suppose that states have the form x = (y, c) where y is the kinematic state variable restricted to some bounded region D of (hyper)volume λ(D) and c the discrete state variable(s), drawn from a universe C with N possible members. In conventional statistics, the uniform distribution u(x) = λ(D)^–1N^–1 is the most common way of initializing a Bayesian algorithm when nothing is known about the initial state of the target.

The concepts of prior and uniform distributions carry over to multitarget problems, but in this case there is an additional dimension that must be taken into account—target number. For example, suppose that there can be no more than M possible targets in a scene.^1,6 If X = {x₁, …, x_n}, then the multitarget uniform distribution is defined by

u (X) = {\begin{matrix} n! \cdot N^{- n} \cdot λ {(D)}^{- n} \cdot (M + {1)}^{- 1} & if X \subseteq D \times C \\ 0 & if otherwise \end{matrix}

$u (X) = {\begin{matrix} n! \cdot N^{- n} \cdot λ {(D)}^{- n} \cdot (M + {1)}^{- 1} & if X \subseteq D \times C \\ 0 & if otherwise \end{matrix}$

16.7.3 Multitarget Distributions and Units of Measurement

Multitarget posterior and prior distributions, like multitarget density functions in general, have one peculiarity that sets them apart from conventional density functions: their behavior with respect to units of measurement.^1,2,8 In particular, the units of measurement of a multitarget prior or posterior f(X) vary with the cardinality of X. This has important consequences for multitarget state estimation, as described in the next section.

16.7.4 Failure of the Classical State Estimators

In general, the classical Bayes-optimal estimators cannot be extended to the multitarget case. This can be explained using a simple one-dimensional example.^2,4 Let

f (X) = {\begin{matrix} \frac{1}{2} & if X = \emptyset \\ \frac{1}{2} \cdot N_{σ^{2}} (x - 1) & if X = {x} \\ 0 & if |X| \geq 2 \end{matrix}

$f (X) = {\begin{matrix} \frac{1}{2} & if X = \emptyset \\ \frac{1}{2} \cdot N_{σ^{2}} (x - 1) & if X = {x} \\ 0 & if |X| \geq 2 \end{matrix}$

where the variance σ² has units km². To compute that classical MAP estimate, find the state X = ∅ or X = {x} that maximizes f(X). Because f(∅) = 1/2 is a unitless probability and f({1}) = 1/2 · (2π)^−1/2 · σ⁻¹ has units of km⁻¹, the classical MAP would compare the values of two quantities that are incommensurable because of mismatch of units. As a result, the numerical value of f({1}) can be arbitrarily increased or decreased—thereby getting X^MAP = ∅ (no target in the scene) or X^MAP ≠ ∅ (target in the scene)—simply by changing units of measurement.

The posterior expectation also fails. If it existed, it would be

\int X \cdot f (X) δ X = \emptyset \cdot f (\emptyset) + \int x \cdot f (x) d x = \frac{1}{2} (\emptyset + 1 km)

$\int X \cdot f (X) δ X = \emptyset \cdot f (\emptyset) + \int x \cdot f (x) d x = \frac{1}{2} (\emptyset + 1 km)$

Notice that, once again, there is the problem of mismatched units—the unitless quantity ∅ must be added to the quantity 1 km. Even if the variable x has no units of measurement, the quantity ∅ still must be added to the quantity 1. If ∅ + 1 = ∅, then 1 = 0, which is impossible. If ∅ + 1 = 1 then ∅ = 0, then the same mathematical symbol represents two different states: the no-target state ∅ and the single-target state x = 0. The same problem occurs if ∅ + a = b is defined for any real numbers a, b since then ∅ = b – a.

16.7.5 Optimal Multitarget State Estimators

We have just seen that the classical Bayes-optimal state estimators do not exist in general multitarget situations. Therefore, new multitarget state estimators must be defined and demonstrated to be statistically well behaved.

In conventional statistics, the maximum likelihood estimator (MLE) is a special case of the MAP estimator, assuming that the prior is uniform. As such, it is optimal and convergent. In the multitarget case, this does not hold true. If f_k+1(Z|X) is the multitarget likelihood function, then the units of measurement for f_k+1(Z|X) are determined by the observation-set Z (which is fixed) and not by the multitarget state X. Consequently, in multitarget situations the classical MLE is defined⁶ even though the classical MAP is not:

\begin{matrix} X^{MLE} = & arg max f (X_{1}, ..., X_{n}), & X^{MLE} = & arg max f (X) \\ n, X_{1}, ..., X_{n} & \begin{matrix} \begin{matrix} X \end{matrix} \end{matrix} \end{matrix}

$\begin{matrix} X^{MLE} = & arg max f (X_{1}, ..., X_{n}), & X^{MLE} = & arg max f (X) \\ n, X_{1}, ..., X_{n} & \begin{matrix} \begin{matrix} X \end{matrix} \end{matrix} \end{matrix}$

where the second equation is the same as the first, but written in condensed notation. The multitarget MLE will converge to the correct answer if given enough data.¹

Because the multitarget MLE is not a Bayes estimator, new multitarget Bayes state estimators must be defined and their optimality demonstrated. In 1995, two such estimators were introduced, the Marginal Multitarget Estimator (MaME) and the Joint Multitarget Estimator (JoME).^1,2 The JoME is defined as

\begin{matrix} X^{JoME} = & arg max f (X_{1}, ..., X_{n}) \cdot \frac{c^{n}}{n!}, & X^{JoME} = arg max f (X) \cdot \frac{c^{| X|}}{| X |!} \\ n, X_{1}, ..., X_{n} & \begin{matrix} X \end{matrix} \end{matrix}

$\begin{matrix} X^{JoME} = & arg max f (X_{1}, ..., X_{n}) \cdot \frac{c^{n}}{n!}, & X^{JoME} = arg max f (X) \cdot \frac{c^{| X|}}{| X |!} \\ n, X_{1}, ..., X_{n} & \begin{matrix} X \end{matrix} \end{matrix}$

where c is a fixed constant, the units of which are the same as those for x. One of the consequences of this is that both the multitarget MLE and the JoME estimate the number $\hat{n}$ $\hat{n}$ and the identities/kinematics ${\hat{x}}_{1}, ..., {\hat{x}}_{\hat{n}}$ ${\hat{x}}_{1}, ..., {\hat{x}}_{\hat{n}}$ of targets optimally and simultaneously without resorting to optimal report-to-track association. In other words, they optimally resolve the conflicting objectives of detection, tracking, and identification.

16.7.6 Multitarget Miss Distance

FISST provides natural generalizations of the concept of miss distance to multitarget situations. Let X = {x₁, …, x_n} and Y = {y₁, …, y_m}. Then the simplest definition of the multitarget miss distance between X and Y is the Hausdorff distance. This is defined by

\begin{matrix} d_{H} (X, Y) = max {d_{0} (X, Y), d_{0} (Y, X)}, & d_{0} (X, Y) = & max & \begin{matrix} \min | | x - y|| \end{matrix} \\ x & \begin{matrix} \begin{matrix} y & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix}  \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix}  \end{matrix} \end{matrix} \end{matrix} \end{matrix}

$\begin{matrix} d_{H} (X, Y) = max {d_{0} (X, Y), d_{0} (Y, X)}, & d_{0} (X, Y) = & max & \begin{matrix} \min | | x - y|| \end{matrix} \\ x & \begin{matrix} \begin{matrix} y & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix}$

The Hausdorff distance tends to be relatively insensitive to differences in cardinality between X and Y. Consequently, Hoffman and Mahler⁵⁰ have introduced the family of Wasserstein distances. These are defined by

d_{W}^{p} (X, Y) = min_{C} \sqrt[p]{\sum_{i = 1}^{n} \sum_{j = 1}^{m} C_{i, j} \cdot d (x_{i}, y_{j})}

$d_{W}^{p} (X, Y) = min_{C} \sqrt[p]{\sum_{i = 1}^{n} \sum_{j = 1}^{m} C_{i, j} \cdot d (x_{i}, y_{j})}$

where the minimum is taken over all so-called transportation matrices C. If n = m then this reduces to an objective function for an optimal assignment problem:

d_{W}^{p} (X, Y) = min_{σ} \sqrt[p]{\sum_{i = 1}^{n} d (x_{i}, y_{σ i})}

$d_{W}^{p} (X, Y) = min_{σ} \sqrt[p]{\sum_{i = 1}^{n} d (x_{i}, y_{σ i})}$

where the minimum is taken over all permutations σ on the numbers 1, …, n.

16.7.7 Unified Multitarget Multisource Integration

Suppose that there are a number of independent sources, some of which supply conventional data and others ambiguous data. As in Section 16.5.2, a multisource-multitarget joint generalized likelihood can be constructed of the form

f_{k + 1} (Z|X) = f (Z^{[1]}, ..., Z^{[m]}, Θ^{[1]}, ..., Θ^{[m^{'}]})

$f_{k + 1} (Z|X) = f (Z^{[1]}, ..., Z^{[m]}, Θ^{[1]}, ..., Θ^{[m^{'}]})$

where

Z = Z^{[1]} \cup ... \cup Z^{[m]} \cup Θ^{[1]} \cup ... \cup Θ^{[m^{'}]}

$Z = Z^{[1]} \cup ... \cup Z^{[m]} \cup Θ^{[1]} \cup ... \cup Θ^{[m^{'}]}$

and where Z^[s] denotes a multitarget observation collected by a conventional sensor with identifier s = 1, …, e, and where Θ^[s] denotes a multitarget observation supplied by a source with identifier s = e + 1, …, e + e′ that collects ambiguous data. Given this, the data can be fused using Bayes’ rule. Robust multisource-multitarget detection, tracking, and identification can be accomplished by using the joint generalized likelihood function with the multitarget recursive Bayesian nonlinear filtering equations.

16.8 PHD and CPHD Filters

The single-sensor, single-target Bayesian nonlinear filtering Equations 16.4 and 16.5 are already computationally demanding. Computational difficulties worsen only when attempting to implement the multitarget nonlinear filter (Equations 16.41 and 16.42). This section summarizes two radically new approaches for approximate multitarget nonlinear filtering: the PHD filter^4,48 and its generalization, the cardinalized PHD (CPHD) filter.^4,49

The PHD and CPHD filters are based on an analogy with single-target tracking. The constant-gain Kalman filter (CGKF)—of which the alpha-beta filter is the most familiar one—is the computationally fastest single-target tracking filter, whereas the single-target Bayes filter representing Equations 16.5 and 16.6 propagate the full Bayes posterior distribution f_k|k(x|Z^k); the CGKF propagates only the first-order statistical moment of the posterior—that is, its expected value x_k|k.

In like fashion, whereas the multitarget Bayes filter of Equations 16.41 and 16.42 propagate the full multitarget Bayes posterior f_k|k(X|Z^(k)), the PHD filter propagates only the first-order multitarget moment of the multitarget posterior—that is, its PHD D_k|k(x|Z^(k)). The CPHD generalizes the PHD filter in that it also propagates the entire probability distribution p_k|k(n|Z^(k)) on target number. As a consequence, the CPHD filter exhibits better performance than the PHD filter, but with increased computational loading.

The derivation of the predictor and corrector equations for both filters require the p.g.fl. and multitarget calculus techniques of Section 16.4.

Subsequently, we introduce the concept of the PHD (Section 16.8.1), the PHD filter (Section 16.8.2), and the CPHD filter (Section 16.8.3). We conclude, in Section 16.8.4, with a summary of recently developed implementations of the PHD and CPHD filters.

16.8.1 Probability Hypothesis Density

The first question that confronts us is: what is the multitarget counterpart of an expected value? Let f_Ξ(X) be the multitarget probability distribution of a random finite state-set Ξ. Then a naïve definition of its expected value would be

{\bar{Ξ}}^{naive} = \int X \cdot f_{Ξ} (X) δ X

${\bar{Ξ}}^{naive} = \int X \cdot f_{Ξ} (X) δ X$

However, this integral is mathematically undefined since addition X + X′ of finite subsets X, X′ cannot be usefully defined. Consequently, one must instead resort to a different strategy. Select a transformation X → T_X that converts finite subsets X into vectors T_X in some vector space. This transformation should preserve basic set-theoretic structure by transforming unions into sums: T_X∪X′ = T_X + T_X′ whenever X ∩ X′ = ∅. Given this, one can define an indirect expected value as^47,48

{\bar{Ξ}}^{naive} = \int T_{X} \cdot f_{Ξ} (X) δ X

${\bar{Ξ}}^{naive} = \int T_{X} \cdot f_{Ξ} (X) δ X$

The common practice is to select T_X = δ_X where δ_X(x) = 0 if X = ∅ and, otherwise,

δ_{X} (X) = \sum_{y \in X} δ_{y} (X)

$δ_{X} (X) = \sum_{y \in X} δ_{y} (X)$

where δ_y(x) is the Dirac delta density concentrated at w. Given this,

D_{Ξ} (X) = \int δ_{X} (X) \cdot f_{Ξ} (X) δ X

$D_{Ξ} (X) = \int δ_{X} (X) \cdot f_{Ξ} (X) δ X$

is a multitarget analog of the concept of expected value. It is a density function on single-target state space called the PHD, intensity density, or first-moment density of Ξ or f_Ξ(X).

16.8.2 PHD Filter

We can attempt only a very brief summary of the PHD filter. Additional details can be found in Ref. 48. The PHD filter consists of two equations. The first, or predictor equation, allows the current PHD D_k|k(X|Z^(k)) to be extrapolated to the predicted PHD: the PHD D_k+1|k(X|Z^(k)) at the time of the next observation-set Z_k+1. The second, or corrector equation, allows the predicted PHD to be data-updated to D_k+1|k+1(X|Z^(k+1)).

The PHD filter predictor equation has the form

\begin{matrix} D_{k + 1 | k} (x | Z^{(k)}) = b_{k + 1 | k} (x) + \int F_{k + 1 | k} (x|y) \cdot D_{k | k} (y| Z^{(k)}) δ Y & \begin{matrix} (16.44) \end{matrix} \end{matrix}

$\begin{matrix} D_{k + 1 | k} (x | Z^{(k)}) = b_{k + 1 | k} (x) + \int F_{k + 1 | k} (x|y) \cdot D_{k | k} (y| Z^{(k)}) δ Y & \begin{matrix} (16.44) \end{matrix} \end{matrix}$

where the PHD filter pseudo-Markov transition is

\begin{matrix} F_{k + 1 | k} (x|y) = p_{s} (y) \cdot f_{k + 1 | k} (x|y) + \int b_{k + 1 | k} (X|Y) & (16.45) \end{matrix}

$\begin{matrix} F_{k + 1 | k} (x|y) = p_{s} (y) \cdot f_{k + 1 | k} (x|y) + \int b_{k + 1 | k} (X|Y) & (16.45) \end{matrix}$

The PHD filter corrector equation, however, has the form

\begin{matrix} D_{k + 1 | k + 1} (x| Z^{(k + 1)}) = L_{Z_{k + 1}} (x) \cdot D_{k + 1 | k} (x| Z^{(k)}) & \begin{matrix} (16.46) \end{matrix} \end{matrix}

$\begin{matrix} D_{k + 1 | k + 1} (x| Z^{(k + 1)}) = L_{Z_{k + 1}} (x) \cdot D_{k + 1 | k} (x| Z^{(k)}) & \begin{matrix} (16.46) \end{matrix} \end{matrix}$

where the PHD filter pseudo-likelihood is

\begin{matrix} L_{Z} (x) = 1 - p_{D} (x) + \sum_{z \in Z} \frac{p_{D} (x) \cdot L_{z} (x)}{λ c (Z) + D_{k + 1 | k} [p_{D} L_{z}]} & \begin{matrix} (16.47) \end{matrix} \end{matrix}

$\begin{matrix} L_{Z} (x) = 1 - p_{D} (x) + \sum_{z \in Z} \frac{p_{D} (x) \cdot L_{z} (x)}{λ c (Z) + D_{k + 1 | k} [p_{D} L_{z}]} & \begin{matrix} (16.47) \end{matrix} \end{matrix}$

and where p_D(x) is the probability of detection of a target with state x at time-step k + 1, f_k+1(z|x) = L_z(x) the sensor likelihood function, c(z) = physical distribution of Poisson false alarms, λ the expected value of number of false alarms, and where for any function h(x),

\begin{matrix} D_{k + 1 | k} [h] = \int h (x) \cdot D_{k + 1 | k} [x| Z^{(k)}] d x| & \begin{matrix} (16.48) \end{matrix} \end{matrix}

$\begin{matrix} D_{k + 1 | k} [h] = \int h (x) \cdot D_{k + 1 | k} [x| Z^{(k)}] d x| & \begin{matrix} (16.48) \end{matrix} \end{matrix}$

16.8.3 Cardinalized PHD Filter

The predictor and corrector equations for the CPHD filter are too complex to describe in the present context. Greater details can be found in Refs 4 and 49. Two things should be pointed out, however. First, and unlike the PHD filter, spawning of targets cannot be explicitly modeled in the CPHD predictor equations. Second, and unlike the PHD filter, the false alarm process need not necessarily be Poisson. It can be a more general false alarm process known as an i.i.d. cluster process. Again, details can be found Refs 4 and 49.

As already noted, the CPHD filter generalizes the PHD filter, but at the price of greater computational complexity. Let n be the current number of targets and m observations. Then the PHD filter has computational complexity of order O(mn). The CPHD filter, however, has complexity O(m³n).

16.8.4 Survey of PHD/CPHD Filter Research

In this section, we briefly summarize current PHD filter research. The PHD filter has usually been implemented using sequential Monte Carlo (a.k.a. particle-system) methods, as proposed by Kjellström (nee Sidenbladh)⁵¹ and by Zajic and Mahler.⁵² Instances are provided by Erdinc et al.⁵³ and Vo et al.⁵⁴ Vo, Singh, Doucet, and Clark have established convergence results for the particle-PHD filter.^54,55

Vo and Ma⁵⁶ have devised a closed-form Gaussian-mixture implementation that greatly improves the computational efficiency of the PHD filter. This approach is inherently capable of maintaining track labels.^55,57 Clark and Bell⁵⁵ have proved a strong L₂ uniform convergence property for the Gaussian-mixture PHD filter. Erdinc et al.⁵⁸ have proposed a purely physical interpretation of both the PHD and the CPHD filters.

Since the core PHD filter does not maintain labels for tracks from time-step to time-step, researchers have proposed peak to track association techniques for maintaining track labels with particle-PHD filters.^59,60 These authors have demonstrated in 1D and 2D simulations that their track-valued PHD filters can outperform conventional MHT-type techniques (i.e., significantly fewer false and dropped tracks).

Punithakumar et al.⁶¹ have implemented a multiple motion model version of the PHD filter, as have Pasha et al.⁶²

Punithakumar et al.⁴⁴ have devised and implemented a distributed PHD filter that addresses the problem of communicating and fusing multitarget track information from a distributed network of sensor-carrying platforms.

Balakumar et al.⁶³ have applied a PHD filter to the problem of tracking an unknown and time-varying number of narrowband, far-field signal sources, using a uniform linear array of passive sensors, in a highly nonstationary sensing environment.

Ahlberg et al.⁶⁴ have employed PHD filters for group-target tracking in an ambitious situation assessment simulator system called IFD03.

Tobias and Lanterman^65,66 have applied the PHD filter to target detection and tracking using bistatic radio-frequency observations.

Clark et al.^{67, 68, 69 and 70} have applied the PHD filter to 2D and 3D active-sonar problems.

Ikoma et al.⁷¹ have applied a PHD filter to the problem of tracking the trajectories of feature points in time-varying optical images. Wang et al.⁷² have employed such methods to tracking groups of humans in digital video.

Zajic et al.⁷³ report an algorithm in which a PHD filter is integrated with a robust classifier algorithm that identifies airborne targets from HRRR signatures.

El-Fallah et al.^{74, 75 and 76} have demonstrated implementations of a PHD filter–based sensor management approach.¹⁰

The CPHD filter was first introduced in 2006. As a result, only a few implementations have appeared in the literature. Vo et al.⁷⁷ have devised a Gaussian-mixture implementation of the CPHD filter, combined with EKF and unscented Kalman filter (UKF) versions. They have also described detailed performance comparisons of the GM-PHD and GM-CPHD filters.^78,79

16.9 Summary and Conclusions

FISST was created, in part, to address the issues in probabilistic inference that conventional approaches overlook. These issues include

Dealing with poorly characterized sensor likelihoods
Modeling ambiguous nontraditional data
Constructing likelihoods for nontraditional data
Constructing true likelihoods and Markov densities for multitarget problems
Developing principled new approximate techniques for multitarget filtering
Providing a single, fully probabilistic, systematic, and seamlessly unified foundation for multisource-multitarget detection, tracking, identification, data fusion, sensor management, performance estimation, and threat estimation and prediction
Accomplishing all of these objectives within the framework of a direct and relatively simple and practitioner-friendly generalization of Statistics 101

In the past 2 years, FISST has begun to emerge from the realm of basic research and is being applied, with preliminary success, to a range of practical research applications. This chapter has described the difficulties associated with nontraditional data and multitarget problems, as well as summarized how and why FISST resolves them.

For a practitioner-level, textbook-style treatment of the concepts and techniques described in this chapter, see Statistical Multisource-Multitarget Information Fusion.⁴

Acknowledgments

The core concepts underlying the work reported in this chapter were developed under internal research and development funding in 1993 and 1994 at the Eagan, MN, division of Lockheed Martin Corporation. This work has been supported at the basic research level since 1994 by the U.S. Army Research Office and the Air Force Office of Scientific Research. Various aspects have been supported at the applied research level by the U.S. Air Force Research Laboratory, SPAWAR Systems Center, DARPA, MDA, and the U.S. Army MRDEC. The content does not necessarily reflect the position or the policy of the government or of Lockheed Martin. No official endorsement should be inferred.

References

1. Goodman, I.R., Mahler, R.P.S., and Nguyen, H.T., Mathematics of Data Fusion, Kluwer Academic Publishers, Dordrecht, Holland, 1997.

2. Mahler, R., An introduction to multisource-multitarget statistics and its applications, Lockheed Martin Technical Monograph, March 15, 2000.

3. Mahler, R., ‘Statistics 101’ for multisensor, multitarget data fusion, IEEE Aerosp. Electron. Syst. Mag., Part 2: Tutorials, 19 (1), 53–64, 2004.

4. Mahler, R.P.S., Statistical Multisource-Multitarget Information Fusion, Artech House, Norwood, MA, 2007.

5. Mahler, R., Global integrated data fusion, Proc. 7th Natl. Symp. Sensor Fusion, I (Unclass), ERIM, Ann Arbor, MI, 187–199, 1994.

6. Mahler, R., A unified approach to data fusion, Proc. 7th Joint Data Fusion Symp., 1994, 154, and Selected Papers on Sensor and Data Fusion, Sadjadi, P.A., Ed., SPIE, MS-124, 1996, 325.

7. Mahler, R., Global optimal sensor allocation, Proc. 1996 Natl. Symp. Sensor Fusion, I (Unclass.), 347–366, 1996.

8. Mahler, R., Multisource-multitarget filtering: A unified approach, SPIE Proc., 3373, 296, 1998.

9. Mahler, R., Multitarget Markov motion models, SPIE Proc., 3720, 47, 1999.

10. Mahler, R., Multitarget sensor management of dispersed mobile sensors, Grundel, D., Murphey, R., and Paralos, P., Eds., Theory and Algorithms for Cooperative Systems, World Scientific, Singapore, 14–21, 2005.

11. Mahler, R., Information theory and data fusion, Proc. 8th Natl. Symp. Sensor Fusion, I (Unclass), ERIM, Ann Arbor, MI, 279, 1995.

12. Mahler, R., Unified nonparametric data fusion, SPIE Proc., 2484, 66–74, 1995.

13. Mahler, R., Information for fusion management and performance estimation, SPIE Proc., 3374, 64–74, 1998.

14. Zajic, T. and Mahler, R., Practical information-based data fusion performance estimation, SPIE Proc., 3720, 92, 1999.

15. Mahler, R., Measurement models for ambiguous evidence using conditional random sets, SPIE Proc., 3068, 40–51, 1997.

16. Mahler, R., Unified data fusion: Fuzzy logic, evidence, and rules, SPIE Proc., 2755, 226, 1996.

17. Mahler, R. et al., Nonlinear filtering with really bad data, SPIE Proc., 3720, 59, 1999.

18. Mahler, R., Optimal/robust distributed data fusion: A unified approach, SPIE Proc., 4052, 128–138, 2000.

19. Mahler, R., Decisions and data fusion, Proc. 1997 IRIS Natl. Symp. Sensor Data Fusion, I (Unclass), M.I.T. Lincoln Laboratories, 71, 1997.

20. El-Fallah, A. et al., Adaptive data fusion using finite-set statistics, SPIE Proc., 3720, 80–91, 1999.

21. Allen, R. et al., Passive-acoustic classification system (PACS) for ASW, Proc 1998 IRIS Natl. Symp. Sensor Data Fusion, 179, 1998.

22. Mahler, R. et al., Application of unified evidence accrual methods to robust SAR ATR, SPIE Proc., 3720, 71, 1999.

23. Zajic, T., Hoffman, J.L., and Mahler, R., Scientific performance metrics for data fusion: New results, SPIE Proc., 4052, 172–182, 2000.

24. El-Fallah, A. et al., Scientific performance evaluation for sensor management, SPIE Proc., 4052, 183–194, 2000.

25. Mahler, R., Multisource-multitarget detection and acquisition: A unified approach, SPIE Proc., 3809, 218, 1999.

26. Mahler, R. et al., Joint tracking, pose estimation, and identification using HRRR data, SPIE Proc., 4052, 195, 2000.

27. Bar-Shalom, Y. and Li, X.-R., Estimation and Tracking: Principles, Techniques, and Software, Artech House, Ann Arbor, MI, 1993.

28. Jazwinski, A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, NY, 1970.

29. Sorenson, H.W., Recursive estimation for nonlinear dynamic systems, Bayesian Analysis of Statistical Time Series and Dynamic Models, Spall, J.C., Ed., Marcel Dekker, New York, NY, 1988.

30. Matheron, G., Random Sets and Integral Geometry, Wiley, New York, NY, 1975.

31. Grabisch, M., Nguyen, H.T., and Waler, E.A., Fundamentals of Uncertainty Calculus with Applications to Fuzzy Inference, Kluwer Academic Publishers, Dordrecht, Holland, 1995.

32. Shafer, G. and Logan, R., Implementing Dempster’s rule for hierarchical evidence, Artif. Intell., 33, 271, 1987.

33. Nguyen, H.T., On random sets and belief functions, J. Math. Anal. Appt., 65, 531–542, 1978.

34. Hestir, K., Nguyen, H.T., and Rogers, G.S. A random set formalism for evidential reasoning, Conditional Logic in Expert Systems, Goodman, I.R., Gupta, M.M., Nguyen, H.T., and Rogers, G.S., Eds., North-Holland, 1991, 309.

35. Goodman, I.R., Fuzzy sets as equivalence classes of random sets, Fuzzy Sets and Possibility Theory, Yager, R., Ed., Permagon, Oxford, U.K., 1982, 327.

36. Orlov, A.L., Relationships between fuzzy and random sets: Fuzzy tolerances, Issledovania po Veroyatnostnostatishesk, Medelironvaniu Realnikh System, Moscow, 1977.

37. Hohle, U., A mathematical theory of uncertainty: Fuzzy experiments and their realizations, Recent Developments in Fuzzy Set and Possibility Theory, Yager, R.R., Ed., Permagon Press, Oxford, U.K., 1981, 344.

38. Mahler, R., Representing rules as random sets, I: Statistical correlations between rules, Inf. Sci., 88, 47, 1996.

39. Mahler, R., Representing rules as random sets, II: Iterated rules, Int. J. Intell. Syst., 11, 583, 1996.

40. Ho, Y.C. and Lee, R.C.K., A Bayesian approach to problems in stochastic estimation and control, IEEE Trans. AC, AC-9, 333–339, 1964.

41. Van Trees, H.L., Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and Linear Modulation Theory, Wiley, New York, NY, 1968.

42. Mahler, R., Unified Bayes multitarget fusion of ambiguous data sources, IEEE Int. Conf. Integration Knowl. Intensive Multi-Agent Syst. (KIMAS), Cambridge, MA, September 30–October 4, 343–348, 2003.

43. Quinio, P. and Matsuyama, T., Random closed sets: A unified approach to the representation of imprecision and uncertainty, Symbolic and Quantitative Approaches to Uncertainty, Kruse, R. and Siegel, P., Eds., Springer-Verlag, New York, NY, 1991, 282.

44. Punithakumar, K., Kirubarajan, T., and Sinha, A., A distributed implementation of a sequential Monte Carlo probability hypothesis density filter for sensor networks, Kadar, I., Ed., Signal Proc., Sensor Fusion Targ. Recognit. XV, SPIE, Vol. 6235, Bellingham, WA, 2006.

45. Fixsen, D. and Mahler, R., The modified Dempster-Shafer approach to classification, IEEE Trans. Syst. Man Cybern.—Part A, 27, 96, 1997.

46. Ryder, L.H., Quantum Field Theory, Second Edition, Cambridge University Press, Cambridge, UK, 1996.

47. Mahler, R., Multitarget moments and their application to multitarget tracking, Proc. Workshop Estimation, Track. Fusion: A Tribute to Y. Bar-Shalom, Naval Postgraduate School, Monterey, CA, May 17, 2001, 134.

48. Mahler, R. Multitarget Bayes filtering via first-order multitarget moments, IEEE Trans. AES, 39 (4), 1152, 2003.

49. Mahler, R., PHD filters of higher order in target number, IEEE Trans. Aerosp. Electron. Syst., 43 (3), 2005.

50. Hoffman, J. and Mahler, R., Multitarget miss distance via optimal assignment, IEEE Trans. Syst. Man Cybern. Part A, 34 (3), 327–336, 2004.

51. Sidenbladh, H., Multi-target particle filtering for the probability hypothesis density, Proc. 6th Int. Conf. Inf. Fusion, Cairns, Australia, 2003, International Society of Information Fusion, Sunnyvale, CA, 2003, 800.

52. Zajic, T. and Mahler, R., A Particle-Systems Implementation of the PHD Multitarget Tracking Filter, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XII, SPIE Proc. Vol. 5096, 291, 2003.

53. Erdinc, O., Willett, P., and Bar-Shalom, Y., Probability Hypothesis Density Filter for Multitarget Multisensor Tracking, Proc. 8th Int. Conf. Inf. Fusion, Vol. 1, Philadelphia, PA, July 25–28, 2005.

54. Vo, B.-N., Singh, S., and Doucet, A., Sequential Monte Carlo methods for multi-target filtering with random finite sets, IEEE Trans. AES, 41 (4), 1224–1245, 2005.

55. Clark, D.E. and Bell, J., Convergence results for the particle PHD filter, IEEE Trans. Signal Proc., 54 (7), 2652–2661, 2006.

56. Vo, B.-N. and Ma, W.-K., A Closed-Form Solution for the Probability Hypothesis Density Filter, Proc. 8th Int. Conf. Inf. Fusion, Philadelphia, PA, July 25–29, 2005, 856–863, International Society of. Information Fusion, Sunnyvale, CA, 2005.

57. Clark, D.E., Panta, K., and Vo, B.-N., The GM-PHD filter multiple target tracker, Proc. 9th Int. Symp. Inf. Fusion, Florence, Italy, 1–8, July 2006, International Society of Information Fusion, Sunnyvale, CA, 2006.

58. Erdinc, O., Willett, P., and Bar-Shalom, Y., A physical-space approach for the probability hypothesis density and cardinalized probability hypothesis density filters, Drummond, O., Ed., Signal Processing of Small Targets 2006, SPIE Proc. Vol. 6236, Bellingham, WA, 2006.

59. Panta, K., Vo, B.-N., Doucet, A., and Singh, S., Probability hypothesis density filter versus multiple hypothesis testing, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XIII, SPIE Vol. 5429, Bellingham WA, 2004.

60. Lin, L., Kirubarajan, T., and Bar-Shalom, Y., Data association combined with the probability hypothesis density filter for multitarget tracking, Drummond, O., Ed., Signal and Data Proc. of Small Targets 2004, SPIE Vol. 5428, 464–475, Bellingham, WA, 2004.

61. Punithakumar, K., Kirubarajan, T., and Sinha, A., A multiple model probability hypothesis density filter for tracking maneuvering targets, Drummond, O., Ed., Signal Data Proc. Small Targets 2004, SPIE Vol. 5428, 113–121, Bellingham, WA, 2004.

62. Pasha, A., Vo, B.-N., Tuan, H.D., and Ma, W.-K., Closed-form PHD filtering for linear jump Markov models, Proc. 9th Int. Symp. Inf. Fusion, Florence, Italy, July 10–13, 2006, International Society of Information Fusion, Sunnyvale, CA, 2006.

63. Balakumar, B., Sinha, A., Kirubarajan, T., and Reilly, J.P., PHD filtering for tracking an unknown number of sources using an array of sensors, Proc. 13th IEEE Workshop Stat. Signal Proc., Bordeaux, France, 43–48, July 17–20, 2005.

64. Ahlberg, A., Hörling, P., Kjellström, H., Jöred, H.K., Mårtenson, C., Neider, C.G., Schubert, J., Svenson, P., Svensson, P., Undén, P.K., and Walter, J., The IFD03 information fusion demonstrator, Proc. 7th Int.. Conf. on Inf. Fusion, Stockholm, Sweden, June 28–July 1, 2004, International Society of Information Fusion, Sunnyvale, CA, 2004, 936.

65. Tobias, M. and Lanterman, A.D. Probability hypothesis density-based multitarget tracking with bistatic range and Doppler observations, IEE Proc. Radar Sonar Navig., 152 (3), 195–205, 2005.

66. Tobias, M. and Lanterman, A., Multitarget tracking using multiple bistatic range measurements with probability hypothesis densities, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XIII, SPIE Vol. 5429, 296–305, Bellingham, WA, 2004.

67. Clark, D.E. and Bell, J., Bayesian multiple target tracking in forward scan sonar images using the PHD filter, IEE Radar Sonar Nav., 152 (5), 327–334, 2005.

68. Clark, D.E. and Bell, J., Data association for the PHD filter, Proc. Conf. Intell. Sensors Sensor Netw. Info. Process., Melbourne, Australia, 217–222, December 5–8, 2005.

69. Clark, D.E. and Bell, J., GM-PHD filter multitarget tracking in sonar images, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XV, SPIE Vol. 6235, Bellingham, WA, 2006.

70. Clark, D.E., Bell, J., de Saint-Pern, Y., and Petillot, Y., PHD filter multi-target tracking in 3D sonar, Proc. IEEE OCEANS05-Europe, Brest, France, 265–270, June 20–23, 2005.

71. Ikoma, N., Uchino, T., and Maeda, H., Tracking of feature points in image sequence by SMC implementation of PHD filter, Proc. Soc. Instrum. Contr. Eng. (SICE) Annu. Conf., Hokkaido, Japan, Vol. 2, 1696–1701, August 4–6, 2004.

72. Wang, Y.-D., Wu, J.-K., Kassim, A.A., and Huang, W.-M., Tracking a variable number of human groups in video using probability hypothesis density, Proc. 18th Int. Conf. Pattern Recognit., Hong Kong, Vol. 3, 1127–1130, August 20–24, 2006.

73. Zajic, T., Ravichandran, B., Mahler, R., Mehra, R., and Noviskey, M., Joint tracking and identification with robustness against unmodeled targets, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XII, SPIE Vol. 5096, Bellingham, WA, 279–290, 2003.

74. El-Fallah, A., Zatezalo, A., Mahler, R., Mehra, R., and Alford, M., Advancements in situation assessment sensor management, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XV, SPIE Proc. Vol. 6235, 62350M, Bellingham, WA, 2006.

75. El-Fallah, A., Zatezalo, A., Mahler, R., Mehra, R., and Alford, A., Regularized multi-target particle filter for sensor management, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XV, SPIE Proc. Vol. 6235, Bellingham, WA, 2006.

76. El-Fallah, A., Zatezalo, A., Mahler, R., Mehra, K.R., and Alford, M., Unified Bayesian situation assessment sensor management, Kadar, I., Ed., Signal Proc. Sensor Fusion Targ. Recognit. XIV, SPIE Proc. Vol. 5809, Bellingham, WA, 253–264, 2005.

77. Vo, B.-T., Vo, B.-N., and Cantoni, A., Analytic implementations of the cardinalized probability hypothesis density filter, IEEE Trans. Signal Proc., 55 (7), 3553–3567, 2006.

78. Vo, B.-N., Vo, B.-T., and Singh, S., Sequential Monte Carlo methods for static parameter estimation in random set models, Proc. 2nd Int. Conf. Intell. Sensors Sensor Netw. Inf. Process., Melbourne, Australia, 313–318, December 14–17, 2004.

79. Vo, B.-T. and Vo, B.-N., Performance of PHD based multi-target filters, Proc. 9th Int. Conf. Inf. Fusion, Florence, Italy, 1–8, July 10–13, 2006, International Society of Information Fusion, Sunnyvale, CA.

80. Mahler, R., Random sets: Unification and computation for information fusion—a retrospective assessment, Proc. 7th Int. Conf. Inf. Fusion, Stockholm, Vol. 1, 1–20, 2004.

81. Kruse, R., Schwencke, E., and Heinsohn, J., Uncertainty and Vagueness in Knowledge-Based Systems, Springer-Verlag, New York, 1991.

82. Goutsias, J., Mahler, R., and Nguyen, H.T. Random Sets: Theory and Application, Springer-Verlag, New York, 1997.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16 Random Set Theory for Multisource-Multitarget Information Fusion

Create new playlist

Sign In

Sign Up

Table of Contents for
16 Random Set Theory for Multisource-Multitarget Information Fusion