15 Data Fusion in Nonlinear Systems

The extended Kalman filter (EKF) has been one of the most widely used methods for tracking and estimation based on its apparent simplicity, optimality, tractability, and robustness. However, after more than 30 years of experience with it, the tracking and control community has concluded that the EKF is difficult to implement, difficult to tune, and only reliable for systems that are almost linear on the timescale of the update intervals. This chapter reviews the unscented transformation (UT), a mechanism for propagating mean and covariance information through nonlinear transformations, and describes its implications for data fusion.¹ This method is more accurate, is easier to implement, and uses the same order of calculations as the EKF. Furthermore, the UT permits the use of Kalman-type filters in applications where, traditionally, their use was not possible. For example, the UT can be used to rigorously integrate artificial intelligence (AI)-based systems with Kalman-based systems.

Performing data fusion requires estimates of the state of a system to be converted to a common representation. The mean and covariance representation is the lingua franca of modern systems engineering. In particular, the covariance intersection (CI) ² and Kalman filter (KF)³ algorithms provide mechanisms for fusing state estimates defined in terms of means and covariances, where each mean vector defines the nominal state of the system and its associated error covariance matrix defines a lower bound on the squared error. However, most data fusion applications require the fusion of mean and covariance estimates defining the state of a system in different coordinate frames. For example, a tracking system might maintain estimates in a global Cartesian coordinate frame, whereas observations of the tracked objects are generated in the local coordinate frames of various sensors. Therefore, a transformation must be applied to convert between the global coordinate frame and each local coordinate frame.

If the transformation between coordinate frames is linear, the linearity properties of the mean and covariance make the application of the transformation trivial. Unfortunately, most tracking sensors take measurements in a local polar or spherical coordinate frame (i.e., they measure range and bearings) that is not linearly transformable to a Cartesian coordinate frame. Rarely are the natural coordinate frames of two sensors linearly related. This fact constitutes a fundamental problem that arises in virtually all practical data fusion systems.

The UT, a mechanism that addresses the difficulties associated with converting mean and covariance estimates from one coordinate frame to another, can be applied to obtain mean and covariance estimates from systems that do not inherently produce estimates in that form. For example, this chapter describes how the UT can allow high-level AI and fuzzy control systems to be integrated seamlessly with low-level KF and CI systems.

The structure of this chapter is as follows. Section 15.2 describes the nonlinear transformation problem within the KF framework and analyzes the KF prediction problem in detail. The UT is introduced and its performance is analyzed in Section 15.3. Section 15.4 demonstrates the effectiveness of the UT with respect to a simple nonlinear transformation (polar to Cartesian coordinates with large bearing uncertainty) and a simple discontinuous system. Section 15.5 examines how the transformation can be embedded into a fully recursive estimator that incorporates process and observation noise. Section 15.6 discusses the use of the UT in a tracking example, and Section 15.7 describes its use with a complex process and observation model. Finally, Section 15.8 shows how the UT ties multiple levels of data fusion together into a single, consistent framework.

15.2 Estimation in Nonlinear Systems

15.2.1 Problem Statement

Minimum mean squared error (MMSE) estimators can be broadly classified into linear and nonlinear estimators. Of the linear estimators, by far the most widely used is the KF.³ * Many researchers have attempted to develop suitable nonlinear MMSE estimators. However, the optimal solution requires that a complete description of the conditional probability density be maintained,⁴ and this exact description requires a potentially unbounded number of parameters. As a consequence, many suboptimal approximations have been proposed in the literature. Traditional methods are reviewed by Jazwinski⁵ and Maybeck.⁶ Recent algorithms have been proposed by Daum,⁷ Gordon et al.,⁸ and Kouritzin.⁹ Despite the sophistication of these and other approaches, the EKF remains the most widely used estimator for nonlinear systems.^10,11 The EKF applies the KF to nonlinear systems by simply linearizing all the nonlinear models so that the traditional linear KF equations can be applied. However, in practice, the EKF has three well-known drawbacks:

Linearization can produce highly unstable filters if the assumption of local linearity is violated. Examples include estimating ballistic parameters of missiles^{12, 13, 14 and 15} and some applications of computer vision.¹⁶ As demonstrated later in this chapter, some extremely common transformations that are used in target tracking systems are susceptible to these problems.
Linearization can be applied only if the Jacobian matrix exists, and the Jacobian matrix exists only if the system is differentiable at the estimate. Although this constraint is satisfied by the dynamics of continuous physical systems, some systems do not satisfy this property. Examples include jump-linear systems, systems whose sensors are quantized, and expert systems that yield a finite set of discrete solutions.
Finally, the derivation of the Jacobian matrices is nontrivial in most applications and can often lead to significant implementation difficulties. In Dulimov,¹⁷ for example, the derivation of a Jacobian requires six pages of dense algebra. Arguably, this has become less of a problem, given the widespread use of symbolic packages such as Mathematica¹⁸ and Maple.¹⁹ Nonetheless, the computational expense of calculating a Jacobian can be extremely high if the expressions for the terms are nontrivial.

Appreciating how the UT addresses these three problems requires an understanding of some of the mechanics of the KF and EKF.

Let the state of the system at a time step k be the state vector x(k). The KF propagates the first two moments of the distribution of x(k) recursively and has a distinctive predictor–corrector structure. Let $\hat{x} (i | j)$ $\hat{x} (i | j)$ be the estimate of x(i) using the observation information up to and including time j, Z ^j = [z(1), …, z( j)]. The covariance of this estimate is P(i|j). Given an estimate $\hat{x} (k | k)$ $\hat{x} (k | k)$ , the filter first predicts what the future state of the system will be using the process model. Ideally; the predicted quantities are given by the expectations

\begin{matrix} \hat{x} (k + 1 | k) = E [f [x (k), u (k), v (k), k] | Z^{k}] & (15.1) \end{matrix}

$\begin{matrix} \hat{x} (k + 1 | k) = E [f [x (k), u (k), v (k), k] | Z^{k}] & (15.1) \end{matrix}$

\begin{matrix} P (k + 1 | k) = E [{x (k + 1) - \hat{x} (k + 1 | k)} {x (k + 1) - \hat{x} (k + 1 | k)}^{T} | Z^{k}] & (15.2) \end{matrix}

$\begin{matrix} P (k + 1 | k) = E [{x (k + 1) - \hat{x} (k + 1 | k)} {x (k + 1) - \hat{x} (k + 1 | k)}^{T} | Z^{k}] & (15.2) \end{matrix}$

When $f [\cdot]$ $f [\cdot]$ and $h [\cdot]$ $h [\cdot]$ are nonlinear, the precise values of these statistics can be calculated only if the distribution of x(k) is perfectly known. However, this distribution has no general form, and a potentially unbounded number of parameters are required. Therefore, in most practical algorithms these expected values must be approximated.

The estimate $\hat{x} (k + 1 | k + 1)$ $\hat{x} (k + 1 | k + 1)$ is found by updating the prediction with the current sensor measurement. In the KF, a linear update rule is specified and the weights are chosen to minimize the mean squared error of the estimate.

\begin{matrix} \begin{matrix} \hat{x} (k + 1 | k + 1) & = \hat{x} (k + 1 | k) + W (k + 1) υ (k + 1) \\ P (k + 1 | k + 1) & = P (k + 1 | k) - W (k + 1) P_{υ υ} (k + 1 | k) W^{T} (k + 1) \\ υ (k + 1) & = z (k + 1) - \hat{z} (k + 1 | k) \\ W (k + 1) & = P_{x υ} (k + 1 | k) P_{υ υ}^{- 1} (k + 1 | k) \end{matrix} & (15.3) \end{matrix}

$\begin{matrix} \begin{matrix} \hat{x} (k + 1 | k + 1) & = \hat{x} (k + 1 | k) + W (k + 1) υ (k + 1) \\ P (k + 1 | k + 1) & = P (k + 1 | k) - W (k + 1) P_{υ υ} (k + 1 | k) W^{T} (k + 1) \\ υ (k + 1) & = z (k + 1) - \hat{z} (k + 1 | k) \\ W (k + 1) & = P_{x υ} (k + 1 | k) P_{υ υ}^{- 1} (k + 1 | k) \end{matrix} & (15.3) \end{matrix}$

Note that these equations are only a function of the predicted values of the first two moments of x(k) and z(k). Therefore, the problem of applying the KF to a nonlinear system is the ability to predict the first two moments of x(k) and z(k).

15.2.2 Transformation of Uncertainty

The problem of predicting the future state or observation of the system can be expressed in the following form. Suppose that x is a random variable with mean $\bar{x}$ $\bar{x}$ and covariance P_xx. A second random variable, y, is related to x through the nonlinear function

\begin{matrix} y = f[x] & (15.4) \end{matrix}

$\begin{matrix} y = f[x] & (15.4) \end{matrix}$

The mean $\bar{y}$ $\bar{y}$ and covariance P_yy of y must be calculated.

The statistics of y are calculated by (1) determining the density function of the transformed distribution and (2) evaluating the statistics from that distribution. In some special cases, exact, closed-form solutions exist (e.g., when f[⋅] is linear or is one of the forms identified in Daum⁷). However; as explained in the preceeding text, most data fusion problems do not possess closed-form solutions and some kind of an approximation must be used. A common approach is to develop a transformation procedure from the Taylor series expansion of Equation 15.4 about $\bar{x}$ $\bar{x}$ . This series can be expressed as

\begin{matrix} f[x] & = f[\bar{x} + δ x] \\ = f[\bar{x}]+ \nabla f δ x + \frac{1}{2} \nabla^{2} f δ x^{2} + \frac{1}{3!} \nabla^{3} f δ x^{3} + \frac{1}{4!} \nabla^{4} f δ x^{4} + ... & (15.5) \end{matrix}

$\begin{matrix} f[x] & = f[\bar{x} + δ x] \\ = f[\bar{x}]+ \nabla f δ x + \frac{1}{2} \nabla^{2} f δ x^{2} + \frac{1}{3!} \nabla^{3} f δ x^{3} + \frac{1}{4!} \nabla^{4} f δ x^{4} + ... & (15.5) \end{matrix}$

where δx is a zero-mean Gaussian variable with covariance P_xx and ∇ⁿfδxⁿ is the appropriate nth-order term in the multidimensional Taylor series. The transformed mean and covariance are

\begin{matrix} \bar{y} = f[\bar{x}] + \frac{1}{2} ▽^{2} {fP}_{xx} + \frac{1}{2} ▽^{4} f E [δ x^{4}] + ... & (15.6) \end{matrix}

$\begin{matrix} \bar{y} = f[\bar{x}] + \frac{1}{2} ▽^{2} {fP}_{xx} + \frac{1}{2} ▽^{4} f E [δ x^{4}] + ... & (15.6) \end{matrix}$

\begin{matrix} P_{yy} & = ▽ f P_{x x} (▽ f)^{T} + \frac{1}{2 \times 4!} ▽^{2} f (E [δ x^{4}] - E [δ x^{2} P_{yy}] - E [P_{yy} δ x^{2}] + P_{yy}^{2}) (▽^{2} f)^{T} \\ + \frac{1}{3} ▽^{2} f E [δ x^{4}] (▽ f)^{T} + ... & (15.7) \end{matrix}

$\begin{matrix} P_{yy} & = ▽ f P_{x x} (▽ f)^{T} + \frac{1}{2 \times 4!} ▽^{2} f (E [δ x^{4}] - E [δ x^{2} P_{yy}] - E [P_{yy} δ x^{2}] + P_{yy}^{2}) (▽^{2} f)^{T} \\ + \frac{1}{3} ▽^{2} f E [δ x^{4}] (▽ f)^{T} + ... & (15.7) \end{matrix}$

In other words, the nth-order term in the series for $\bar{x}$ $\bar{x}$ is a function of the nth-order moments of x multiplied by the nth-order derivatives of $f [\cdot]$ $f [\cdot]$ evaluated at $x = \bar{x}$ $x = \bar{x}$ . If the moments and derivatives can be evaluated correctly up to the nth order, the mean is correct up to the nth order as well. Similar comments hold for the covariance equation, although the structure of each term is more complicated. Since each term in the series is scaled by a progressively smaller and smaller term, the lowest-order terms in the series are likely to have the greatest impact. Therefore, the prediction procedure should be concentrated on evaluating the lower-order terms.

The EKF exploits linearization. Linearization assumes that the second- and higher-order terms of δx in Equation 15.5 can be neglected. Under this assumption,

\begin{matrix} \bar{y} = f[\bar{X}] & (15.8) \end{matrix}

$\begin{matrix} \bar{y} = f[\bar{X}] & (15.8) \end{matrix}$

\begin{matrix} P_{yy} = ▽ {fP}_{xx} (▽ f)^{T} & (15.9) \end{matrix}

$\begin{matrix} P_{yy} = ▽ {fP}_{xx} (▽ f)^{T} & (15.9) \end{matrix}$

However, in many practical situations, linearization introduces significant biases or errors. These cases require more accurate prediction techniques.

15.3 Unscented Transformation

15.3.1 Basic Idea

The UT is a method for calculating the statistics of a random variable that undergoes a nonlinear transformation. This method is founded on the intuition that it is easier to approximate a probability distribution than it is to approximate an arbitrary nonlinear function or transformation.²⁰ The approach is illustrated in Figure 15.1: a set of points (sigma points) is chosen with sample mean and sample covariance of the nonlinear function is $\bar{x}$ $\bar{x}$ and P_xx. The nonlinear function is applied to each point, in turn, to yield a cloud of transformed points; $\bar{y}$ $\bar{y}$ and P_yy are the statistics of the transformed points.

Although this method bears a superficial resemblance to Monte Carlo–type methods, there is an extremely important and fundamental difference. The samples are not drawn at random; they are drawn according to a specific, deterministic algorithm. Since the problems of statistical convergence are not relevant, high-order information about the distribution can be captured using only a very small number of points. For an n-dimensional space, only n + 1 points are needed to capture any given mean and covariance. If the distribution is known to be symmetric, 2n points are sufficient to capture the fact that the third-order and all higher-order odd moments are zero for any symmetric distribution.²⁰

Images

FIGURE 15.1
The principle of the unscented transformation.

The set of sigma points, S, consists of l vectors and their appropriate weights, S = {i = 0, 0, …, l − 1:X_i, W_i}. The weights W_i can be positive or negative but must obey the normalization condition

\begin{matrix} \sum_{i = 0}^{l - 1} W_{i} = 1 & (15.10) \end{matrix}

$\begin{matrix} \sum_{i = 0}^{l - 1} W_{i} = 1 & (15.10) \end{matrix}$

Given these points, $\bar{y}$ $\bar{y}$ and P_yy are calculated using the following procedure:

Instantiate each point through the function to yield the set of transformed sigma points,

$y_{i} = f [X_{i}]$ $y_{i} = f [X_{i}]$
The mean is given by the weighted average of the transformed points,

$\begin{matrix} \bar{y} = \sum_{i = 0}^{l - 1} W_{i} y_{i} & (15.11) \end{matrix}$ $\begin{matrix} \bar{y} = \sum_{i = 0}^{l - 1} W_{i} y_{i} & (15.11) \end{matrix}$
The covariance is the weighted outer product of the transformed points,

$\begin{matrix} P_{yy} = \sum_{i = 1}^{l - 1} W_{i} {y_{i} - \bar{y}} {y_{i} - \bar{y}}^{T} & (15.12) \end{matrix}$ $\begin{matrix} P_{yy} = \sum_{i = 1}^{l - 1} W_{i} {y_{i} - \bar{y}} {y_{i} - \bar{y}}^{T} & (15.12) \end{matrix}$

The crucial issue is to decide how many sigma points should be used, where they should be located, and what weights they should be assigned. The points should be chosen so that they capture the most important properties of x. This can be formalized as follows. Let P_x(x) be the density function of x. The sigma points capture the necessary properties by obeying the condition

g [S, p_{x} (x)] = 0

$g [S, p_{x} (x)] = 0$

The decision about which properties of x are to be captured precisely and which are to be approximated is determined by the demands of the particular application in question. Here, the moments of the distribution of the sigma points are matched with those of x. This is motivated by the Taylor series expansion, given in Section 15.2.2, which shows that matching the moments of x up to the nth-order means that Equations 15.11 and 15.12 captures $\bar{y}$ $\bar{y}$ and P_yy, up to the nth order as well.²¹

Note that the UT is distinct from other efforts published in the literature. First, some authors have considered the related problem of assuming that the distribution takes on a particular parameterized form, rather than an entire, arbitrary distribution. Kushner, for example, describes an approach whereby a distribution is approximated at each time step by a Gaussian.²² However, the problem with this approach is that it does not address the fundamental problem of calculating the mean and covariance of the nonlinearly transformed distribution. Second, the UT bears some relationship to quadrature, which has been used to approximate the integrations implicit in statistical expectations. However, the UT avoids some of the difficulties associated with quadrature methods by approximating the unknown distribution. In fact, the UT is most closely related to perturbation analysis. In a 1989 article, Holztmann introduced a noninfinitesimal perturbation for a scalar system.²³ Holtzmann’s solution corresponds to that of the symmetric UT in the scalar case, but their respective generalizations (e.g., to higher dimensions) are not equivalent.

15.3.2 Example Set of Sigma Points

A set of sigma points can be constructed using the constraints that they capture the first three moments of a symmetric distribution: g[S, p_x(x)] = [g₁[S, p_x(x)]g₂[S, p_x(x)]g₃[S, p_x(x)]]^T where

\begin{matrix} g_{1} [S, p_{x} (x)] = \sum_{i = 0}^{p} W_{i} X_{i} - \hat{x} & (15.13) \end{matrix}

$\begin{matrix} g_{1} [S, p_{x} (x)] = \sum_{i = 0}^{p} W_{i} X_{i} - \hat{x} & (15.13) \end{matrix}$

\begin{matrix} g_{2} [S, p_{x} (x)] = \sum_{i = 0}^{p} W_{i} {(X_{i} - \bar{x})}^{2} - P_{xx} & (15.14) \end{matrix}

$\begin{matrix} g_{2} [S, p_{x} (x)] = \sum_{i = 0}^{p} W_{i} {(X_{i} - \bar{x})}^{2} - P_{xx} & (15.14) \end{matrix}$

\begin{matrix} g_{3} [S, p_{x} (x)] = \sum_{i = 0}^{p} W_{i} {(X_{i} - \bar{x})}^{3} & (15.15) \end{matrix}

$\begin{matrix} g_{3} [S, p_{x} (x)] = \sum_{i = 0}^{p} W_{i} {(X_{i} - \bar{x})}^{3} & (15.15) \end{matrix}$

The set is²¹

\begin{matrix} \begin{matrix} X_{0} (k | k) & = \hat{x} (k | k) \\ W_{0} & = \frac{k}{n + k} \\ X_{i} (k | k) & = \hat{x} (k | k) + (\sqrt{(n + k) P (k | k)})_{i} \\ W_{i} & = \frac{k}{2 (n + k)} \\ X_{i + n} (k | k) & = \hat{x} (k | k) - (\sqrt{(n + k) P (k | k)})_{i} \\ W_{i + n} & = \frac{1}{2 (n + k)} \end{matrix} & (15.16) \end{matrix}

$\begin{matrix} \begin{matrix} X_{0} (k | k) & = \hat{x} (k | k) \\ W_{0} & = \frac{k}{n + k} \\ X_{i} (k | k) & = \hat{x} (k | k) + (\sqrt{(n + k) P (k | k)})_{i} \\ W_{i} & = \frac{k}{2 (n + k)} \\ X_{i + n} (k | k) & = \hat{x} (k | k) - (\sqrt{(n + k) P (k | k)})_{i} \\ W_{i + n} & = \frac{1}{2 (n + k)} \end{matrix} & (15.16) \end{matrix}$

where κ is a real number, ${(\sqrt{(n + k) P (k | k)})}_{i}$ ${(\sqrt{(n + k) P (k | k)})}_{i}$ is the ith row or column * of the matrix square root of (n + κ)P(k|k), and W_i is the weight associated with the ith point.

15.3.3 Properties of the Unscented Transform

Despite its apparent similarity to other efforts described in the data fusion literature, the UT has a number of features that make it well suited for the problem of data fusion in practical problems:

The UT can predict with the same accuracy as the second-order Gauss filter, but without the need to calculate Jacobians or Hessians. The reason is that the mean and covariance of x are captured precisely up to the second order, and the calculated values of the mean and covariance of y also are correct to the second order. This indicates that the mean is calculated to a higher order of accuracy than the EKF, whereas the covariance is calculated to the same order of accuracy.
The computational cost of the algorithm is the same order of magnitude as the EKF. The most expensive operations are calculating the matrix square root and determining the outer product of the sigma points to calculate the predicted covariance. However, both operations are O(n³), which is the same cost as evaluating the n × n matrix multiplies needed to calculate the predicted covariance.*
The algorithm naturally lends itself to a black box filtering library. The UT calculates the mean and covariance using standard vector and matrix operations and does not exploit details about the specific structure of the model.
The algorithm can be used with distributions that are not continuous. Sigma points can straddle a discontinuity. Although this does not precisely capture the effect of the discontinuity, its effect is to spread the sigma points out such that the mean and covariance reflect the presence of the discontinuity.
The UT can be readily extended to capture more information about the distribution. Because the UT captures the properties of the distribution, a number of refinements can be applied to improve greatly the performance of the algorithm. If only the first two moments are required, then n + 1 sigma points are sufficient. If the distribution is assumed or is known to be symmetric, then n + 2 sigma points are sufficient.²⁵ Therefore, the total number of calculations required for calculating the new covariance is O(n³), which is the same order as that required by the EKF. The transform has also been demonstrated to propagate successfully the fourth-order moment (or kurtosis) of a Gaussian distribution²⁶ and the third-order moments (or skew) of an arbitrary distribution.²⁷ In one dimension, Tenne and Singh developed the higher-order UT which can capture the first 12 moments of a Gaussian using only seven points.²⁸

15.4 Uses of the Transformation

This section demonstrates the effectiveness of the UT with respect to two nonlinear systems that represent important classes of problems encountered in the data fusion literature—coordinate conversions and discontinuous systems.

15.4.1 Polar to Cartesian Coordinates

One of the most important transformations in target tracking is the conversion from polar to Cartesian coordinates. This transformation is known to be highly susceptible to linearization errors. Lerro and Bar-Shalom, for example, show that the linearized conversion can become inconsistent when the standard deviation in the bearing estimate is less than a degree.²⁹ This subsection illustrates the use of the UT on a coordinate conversion problem with extremely high angular uncertainty.

Suppose a mobile autonomous vehicle detects targets in its environment using a range-optimized sonar sensor. The sensor returns polar information (range, r, and bearing, θ), which is converted to estimate Cartesian coordinates. The transformation is

\begin{matrix} (\begin{matrix} x \\ y \end{matrix}) = (\begin{matrix} r cos θ \\ r sin θ \end{matrix}) & with & ▽ F \end{matrix} = [\begin{matrix} cos θ & - r sin θ \\ sin θ & r cos θ \end{matrix}]

$\begin{matrix} (\begin{matrix} x \\ y \end{matrix}) = (\begin{matrix} r cos θ \\ r sin θ \end{matrix}) & with & ▽ F \end{matrix} = [\begin{matrix} cos θ & - r sin θ \\ sin θ & r cos θ \end{matrix}]$

The real location of the target is (0, 1). The difficulty with this transformation arises from the physical properties of the sonar. Fairly good range accuracy (with 2 cm standard deviation) is traded off to give a very poor bearing measurement (standard deviation of 15°).³⁰ The large bearing uncertainty causes the assumption of local linearity to be violated.

To appreciate the errors that can be caused by linearization, compare its values for the statistics of (x, y) with those of the true statistics calculated by Monte Carlo simulation. Owing to the slow convergence of random sampling methods, an extremely large number of samples (3.5 × 10⁶) were used to ensure that accurate estimates of the true statistics were obtained. The results are shown in Figure 15.2a. This figure shows the mean and 1σ contours calculated by each method. The 1σ contour is the locus of points ${y : (y - \bar{y}) P_{y}^{- 1} (y - \bar{y}) = 1}$ ${y : (y - \bar{y}) P_{y}^{- 1} (y - \bar{y}) = 1}$ and is a graphical representation of the size and orientation of P_yy. The figure demonstrates that the linearized transformation is biased and inconsistent. This is most pronounced along the y-axis, where linearization estimates that the position is 1 m, whereas in reality it is 96.7 cm. In this example, linearization errors effectively introduce an error that is over 1.5 times the standard deviation of the range measurement. Since it is a bias that arises from the transformation process itself, the same error with the same sign will be committed each time a coordinate transformation takes place. Even if there were no bias, the transformation would still be inconsistent because its ellipse is not sufficiently extended along the y-axis.

In practice, this inconsistency can be resolved by introducing additional stabilizing noise that increases the size of the transformed covariance. This is one possible explanation of why EKFs are difficult to tune—sufficient noise must be introduced to offset the defects of linearization. However, introducing stabilizing noise is an undesirable solution because the estimate remains biased and there is no general guarantee that the transformed estimate remains consistent or efficient.

The performance benefits of using the UT can be seen in Figure 15.2b, which shows the means and 1σ contours determined by the different methods. The mismatch between the UT mean and the true mean is extremely small (6 × 10^–4). The transformation is consistent, ensuring that the filter does not diverge. As a result, there is no need to introduce artificial noise terms that would degrade performance even when the angular uncertainty is extremely high.

Images

FIGURE 15.2
The mean and standard deviation ellipses for the true statistics, those calculated through linearization and those calculated by the unscented transformation. (a) Results from linearization. The true mean is at ×, and the uncertainty ellipse is solid. Linearization calculates the mean at o, and the uncertainty ellipse is dashed. (b) Results from the UT. The true mean is at ×, and the uncertainty ellipse is dotted. The UT mean is at + (overlapping the position of the true mean) and is the solid ellipse. The linearized mean is at o, and its ellipse is also dotted.

15.4.2 Discontinuous Transformation

Consider the behavior of a two-dimensional particle whose state consists of its position x(k) = [x(k), y(k)]^T. The projectile is initially released at time 1 and travels at a constant and known speed, v_x, in the x direction. The objective is to estimate the mean position and covariance of the position at time 2, [x(2), y(2)]^T, where ∆T ≡ t₂ − t₁. The problem is made difficult by the fact that the path of the projectile is obstructed by a wall that lies in the bottom right quarter-plane (x ≥ 0, y ≤ 0). If the projectile hits the wall, a perfectly elastic collision will occur, and the projectile will be reflected back at the same velocity as it traveled forward. This situation is illustrated in Figure 15.3a, which also shows the covariance ellipse of the initial distribution.

Images

FIGURE 15.3
A discontinuous system example: a particle can either strike a wall and rebound, or continue to move in a straight line. The experimental results show the effect of using different start values for y.

The process model for this system is

\begin{matrix} x (2) = {\begin{matrix} x (1) + Δ T υ_{x} & y (1) \geq 0 \\ x (1) - Δ T υ_{x} & y (1) < 0 \end{matrix} & (15.17) \end{matrix}

$\begin{matrix} x (2) = {\begin{matrix} x (1) + Δ T υ_{x} & y (1) \geq 0 \\ x (1) - Δ T υ_{x} & y (1) < 0 \end{matrix} & (15.17) \end{matrix}$

\begin{matrix} y (2) = y (1) & (15.18) \end{matrix}

$\begin{matrix} y (2) = y (1) & (15.18) \end{matrix}$

At time 1, the particle starts in the left half-plane (x ≤ 0) with position [x(1), y(1)]^T. The error in this estimate is Gaussian, has zero mean, and has covariance P(1|1). Linearized about this start condition, the system appears to be a simple constant velocity linear model.

The true conditional mean and covariance was determined using Monte Carlo simulation for different choices of the initial mean of y. The mean squared error calculated by the EKF and by the UT for different values is shown in Figure 15.3b. The UT estimates the mean very closely, suffering only small spikes as the translated sigma points successively pass the wall. Further analysis shows that the covariance for the filter is only slightly larger than the true covariance, but conservative enough to account for the deviation of its estimated mean from the true mean. The EKF, however, bases its entire estimate of the conditional mean on the projection of the prior mean; therefore, its estimates bear no resemblance to the true mean, except when most of the distribution either hits or misses the wall and the effect of the discontinuity is minimized.

15.5 Unscented Filter

The UT can be used as the cornerstone of a recursive Kalman-type of estimator. The transformation processes that occur in a KF (Equation 15.3) consist of the following steps:

Predict the new state of the system, $\hat{x} (k + 1 | k)$ $\hat{x} (k + 1 | k)$ , and its associated covariance, P(k + 1|k). This prediction must take into account the effects of process noise.
Predict the expected observation, $\hat{z} (k + 1 | k)$ $\hat{z} (k + 1 | k)$ , and the innovation covariance, P_vv(k + 1|k). This prediction should include the effects of observation noise.
Finally, predict the cross-correlation matrix, P_xz(k + 1|k).

These steps can be easily accommodated by slightly restructuring the state vector and process and observation models. The most general formulation augments the state vector with the process and noise terms to give an n^a = n + q + r dimensional vector,

\begin{matrix} x^{a} (k) = [\begin{matrix} x (k) \\ v (k) \\ w (k) \end{matrix}] & (15.19) \end{matrix}

$\begin{matrix} x^{a} (k) = [\begin{matrix} x (k) \\ v (k) \\ w (k) \end{matrix}] & (15.19) \end{matrix}$

The process and observation models are rewritten as a function of x^a(k),

\begin{matrix} \begin{matrix} x (k + 1) & = f^{a} [x^{a} (k), u (k), k] \\ z (k + 1) & = h^{a} [x^{a} (k + 1), u (k), k] \end{matrix} & (15.20) \end{matrix}

$\begin{matrix} \begin{matrix} x (k + 1) & = f^{a} [x^{a} (k), u (k), k] \\ z (k + 1) & = h^{a} [x^{a} (k + 1), u (k), k] \end{matrix} & (15.20) \end{matrix}$

and the UT uses sigma points that are drawn from

\begin{matrix} \begin{matrix} {\hat{x}}^{a} (k | k) = (\begin{matrix} \hat{x} (k | k) \\ 0_{q \times 1} \\ 0_{m \times 1} \end{matrix}) & and & P^{a} (k | k) = [\begin{matrix} P (k | k) & 0 & 0 \\ 0 & Q (k) & 0 \\ 0 & 0 & R (k) \end{matrix}] \end{matrix} & (15.21) \end{matrix}

$\begin{matrix} \begin{matrix} {\hat{x}}^{a} (k | k) = (\begin{matrix} \hat{x} (k | k) \\ 0_{q \times 1} \\ 0_{m \times 1} \end{matrix}) & and & P^{a} (k | k) = [\begin{matrix} P (k | k) & 0 & 0 \\ 0 & Q (k) & 0 \\ 0 & 0 & R (k) \end{matrix}] \end{matrix} & (15.21) \end{matrix}$

The matrices on the leading diagonal are the covariances, and the off-diagonal subblocks are the correlations between the state errors and the process noises.* Although this method requires the use of additional sigma points, it incorporates the noises into the predicted state with the same level of accuracy as the propagated estimation errors. In other words, the estimate is correct to the second order and no Jacobians, Hessians, or other numerical approximations must be calculated.

The full unscented filter is summarized in Table 15.1. However, recall that this is the most general form of the UF and many optimizations can be made. For example, if the process model is linear, but the observation model is not, the normal linear KF prediction equations can be used to calculate $\hat{x} (k + 1 | k)$ $\hat{x} (k + 1 | k)$ and P(k + 1|k). The sigma points would be drawn from the prediction distribution and would only be used to calculate $\hat{z} (k + 1 | k)$ $\hat{z} (k + 1 | k)$ , P_xv(k + 1|k), and P_vv(k + 1|k).

The following two sections describe the application of the unscented filter to two case studies. The first demonstrates the accuracy of the recursive filter, and the second considers the problem of an extremely involved process model.

TABLE 15.1
A General Formulation of The Kalman Filter Using the Unscented Transformation

Images

15.6 Case Study: Using the UF with Linearization Errors

This section considers the problem that is illustrated in Figure 15.4: a vehicle entering the atmosphere at high altitude and at very high speed. The position of the body is to be tracked by a radar that accurately measures range and bearing. This type of problem has been identified by a number of authors^{12, 13, 14 and 15} as being particularly stressful for filters and trackers, based on the strongly nonlinear nature of three types of forces that act on the vehicle. The most dominant is aerodynamic drag, which is a function of vehicle speed and has a substantial nonlinear variation in altitude. The second type of force is gravity, which accelerates the vehicle toward the center of the earth. The final type of force is random buffeting. The effect of these forces gives a trajectory of the form shown in Figure 15.4. Initially the trajectory is almost ballistic; however, as the density of the atmosphere increases, drag effects become important and the vehicle rapidly decelerates until its motion is almost vertical. The tracking problem is made more difficult by the fact that the drag properties of the vehicle could be only very crudely known.

In summary, the tracking system should be able to track an object that experiences a set of complicated, highly nonlinear forces. These depend on the current position and velocity of the vehicle, as well as on certain characteristics that are not precisely known. The filter’s state space consists of the position of the body (x₁ and x₂), its velocity (x₃ and x₄), and a parameter of its aerodynamic properties (x₅). The vehicle state dynamics are

\begin{matrix} \begin{matrix} \dot{x_{1}} (k) & = x_{3} (k) \\ \dot{x_{2}} (k) & = x_{4} (k) \\ \dot{x_{3}} (k) & = D (x) x_{3} (k) + G (x) x_{1} (k) + υ_{1} (k) \\ \dot{x_{4}} (k) & = D (x) x_{4} (k) + G (x) x_{2} (k) + υ_{1} (k) \\ \dot{x_{5}} (k) & = υ_{3} (k) \end{matrix} & (15.22) \end{matrix}

$\begin{matrix} \begin{matrix} \dot{x_{1}} (k) & = x_{3} (k) \\ \dot{x_{2}} (k) & = x_{4} (k) \\ \dot{x_{3}} (k) & = D (x) x_{3} (k) + G (x) x_{1} (k) + υ_{1} (k) \\ \dot{x_{4}} (k) & = D (x) x_{4} (k) + G (x) x_{2} (k) + υ_{1} (k) \\ \dot{x_{5}} (k) & = υ_{3} (k) \end{matrix} & (15.22) \end{matrix}$

Images

FIGURE 15.4
The reentry problem. The dashed line is the sample vehicle trajectory, and the solid line is a portion of the Earth’s surface. The position of the radar is marked by an o.

where D(k) is the drag-related force term, G(k) is the gravity-related force term, and v₁(k), v₂(k), and v₃(k) are the process noise terms. Defining $R (k) = \sqrt{x_{1}^{2} (k) + x_{2}^{2} (k)}$ $R (k) = \sqrt{x_{1}^{2} (k) + x_{2}^{2} (k)}$ as the distance from the center of the Earth and $V (k) = \sqrt{x_{3}^{2} (k) + x_{4}^{2} (k)}$ $V (k) = \sqrt{x_{3}^{2} (k) + x_{4}^{2} (k)}$ as absolute vehicle speed, the drag and gravitational terms are

\begin{matrix} D (k) = - β (k) exp {\frac{[R_{0} - R (k)]}{H_{0}}} V (k) & and & G (k) = - \frac{G m_{0}}{r^{3} (k)} \end{matrix}

$\begin{matrix} D (k) = - β (k) exp {\frac{[R_{0} - R (k)]}{H_{0}}} V (k) & and & G (k) = - \frac{G m_{0}}{r^{3} (k)} \end{matrix}$

where β(k) = β₀ exp x₅(k).

For this example, the parameter values are β₀ = –0.59783, H₀ = 13.406, Gm₀ = 3.9860 × 10⁵, and R₀ = 6374, and they reflect typical environmental and vehicle characteristics.¹⁴ The parameterization of the ballistic coefficient, β(k), reflects the uncertainty in vehicle characteristics.¹³ β₀ is the ballistic coefficient of a typical vehicle, and it is scaled by exp x₅(k) to ensure that its value is always positive. This is vital for filter stability.

The motion of the vehicle is measured by a radar located at (x_r, y_r). It can measure range r and bearing θ at a frequency of 10 Hz, where

\begin{matrix} r_{r} (k) & = \sqrt{(x_{1} (k) - x_{r})^{2} + (x_{2} (k) - y_{r})^{2}} + w_{1} (k) \\ θ (k) & = {tan}^{- 1} (\frac{x_{2} (k) - y_{r}}{x_{1} (k) - x_{r}}) + w_{2} (k) \end{matrix}

$\begin{matrix} r_{r} (k) & = \sqrt{(x_{1} (k) - x_{r})^{2} + (x_{2} (k) - y_{r})^{2}} + w_{1} (k) \\ θ (k) & = {tan}^{- 1} (\frac{x_{2} (k) - y_{r}}{x_{1} (k) - x_{r}}) + w_{2} (k) \end{matrix}$

w₁(k) and w₂(k) are zero-mean uncorrelated noise processes with variances of 1 m and 17 mrad, respectively.³² The high update rate and extreme accuracy of the sensor results in a large quantity of extremely high quality data for the filter.

The true initial conditions for the vehicle are

\begin{matrix} x (0) = (\begin{matrix} \begin{matrix} \begin{matrix} 6500.4 \\ 349.14 \\ - 1.8093 \end{matrix} \\ - 6.7967 \end{matrix} \\ 0.6932 \end{matrix}) & and & P (0) = [\begin{matrix} 10^{- 6} & 0 & 0 & 0 & 0 \\ 0 & 10^{- 6} & 0 & 0 & 0 \\ 0 & 0 & 10^{- 6} & 0 & 0 \\ 0 & 0 & 0 & 10^{- 6} & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] \end{matrix}

$\begin{matrix} x (0) = (\begin{matrix} \begin{matrix} \begin{matrix} 6500.4 \\ 349.14 \\ - 1.8093 \end{matrix} \\ - 6.7967 \end{matrix} \\ 0.6932 \end{matrix}) & and & P (0) = [\begin{matrix} 10^{- 6} & 0 & 0 & 0 & 0 \\ 0 & 10^{- 6} & 0 & 0 & 0 \\ 0 & 0 & 10^{- 6} & 0 & 0 \\ 0 & 0 & 0 & 10^{- 6} & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] \end{matrix}$

In other words, the vehicle’s coefficient is twice the nominal coefficient.

The vehicle is buffeted by random accelerations,

Q (k) = [\begin{matrix} 2.4064 \times 10^{- 5} & 0 & 0 \\ 0 & 2.4064 \times 10^{- 5} & 0 \\ 0 & 0 & 0 \end{matrix}]

$Q (k) = [\begin{matrix} 2.4064 \times 10^{- 5} & 0 & 0 \\ 0 & 2.4064 \times 10^{- 5} & 0 \\ 0 & 0 & 0 \end{matrix}]$

The initial conditions assumed by the filter are

\begin{matrix} \hat{x} (0 | 0) = (\begin{matrix} \begin{matrix} \begin{matrix} 6500.4 \\ 349.14 \\ - 1.8093 \end{matrix} \\ - 6.7967 \end{matrix} \\ 0 \end{matrix}) & and & P (0 | 0) = [\begin{matrix} 10^{- 6} & 0 & 0 & 0 & 0 \\ 0 & 10^{- 6} & 0 & 0 & 0 \\ 0 & 0 & 10^{- 6} & 0 & 0 \\ 0 & 0 & 0 & 10^{- 6} & 0 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}] \end{matrix}

$\begin{matrix} \hat{x} (0 | 0) = (\begin{matrix} \begin{matrix} \begin{matrix} 6500.4 \\ 349.14 \\ - 1.8093 \end{matrix} \\ - 6.7967 \end{matrix} \\ 0 \end{matrix}) & and & P (0 | 0) = [\begin{matrix} 10^{- 6} & 0 & 0 & 0 & 0 \\ 0 & 10^{- 6} & 0 & 0 & 0 \\ 0 & 0 & 10^{- 6} & 0 & 0 \\ 0 & 0 & 0 & 10^{- 6} & 0 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}] \end{matrix}$

Images

FIGURE 15.5
The mean squared errors and estimated covariances calculated by an EKF and an unscented filter. (a) Results for x₁. (b) Results for x₃. (c) Results for x₅. In all of the graphs, the solid line is the mean squared error calculated by the EKF and the dotted line is its estimated covariance. The dashed line is the unscented mean squared error, and the dot-dashed line is its estimated covariance. In all diagrams, the EKF estimate is inconsistent but the UT estimate is not.

The filter uses the nominal initial condition and, to offset for the uncertainty, the variance on the initial estimate is 1.

Both filters were implemented in discrete time, and observations were taken at a frequency of 10 Hz. However, as a result of the intense nonlinearities of the vehicle dynamics equations, the Euler approximation of Equation 15.22 was valid only for small time steps. The integration step was set to be 50 ms, which meant that two predictions were made per update. For the unscented filter, each sigma point was applied through the dynamics equations twice. For the EKF, an initial prediction step and relinearization had to be performed before the second step.

The performance of each filter is shown in Figure 15.5. This figure plots the estimated mean squared estimation error (the diagonal elements of P(k|k)) against actual mean squared estimation error (which is evaluated using 100 Monte Carlo simulations). Only x₁, x₃, and x₅ are shown. The results for x₂ are similar to that for x₁, and the x₄ and x₃ results are the same. In all cases, the unscented filter estimates its mean squared error very accurately, maximizing the confidence of the filter estimates. The EKF, however, is highly inconsistent; the peak mean squared error in x₁ is 0.4 km², whereas its estimated covariance is over 100 times smaller. Similarly, the peak mean squared velocity error is 3.4 × 10^–4 km² s^–2, which is more than five times the true mean squared error. Finally, x₅ is highly biased, and this bias decreases only slowly over time. This poor performance shows that, even with regular and high-quality updates, linearization errors inherent in the EKF can be sufficient to cause inconsistent estimates.

15.7 Case Study: Using the UF with a High-Order Nonlinear System

In many tracking applications, obtaining large quantities of accurate sensor data is difficult. For example, an air traffic control system might measure the location of an aircraft only once every few seconds. When information is scarce, the accuracy of the process model becomes extremely important for two reasons. First, if the control system must obtain an estimate of the target state more frequently than the tracker updates, a predicted tracker position must be used. Different models can have a significant impact on the quality of that prediction.³³ Second, to optimize the performance of the tracker, the limited data from the sensors must be exploited to the greatest degree possible. Within the KF framework, this can only be achieved by developing the most accurate process model that is practical. However, such models can be high order and nonlinear. The UF greatly simplifies the development and refinement of such models.

This section demonstrates the ease with which the UF can be applied to a prototype KF-based localization system for a conventional road vehicle. The road vehicle, shown in Figure 15.6, undertakes maneuvers at speeds in excess of 45 mph (15 m s^–1). The position of the vehicle is to be estimated with submeter accuracy. This problem is made difficult by the paucity of sensor information. Only the following sources are available: an inertial navigation system (which is polled only at 10 Hz), a set of encoders (also polled at 10 Hz), and a bearing-only sensor (rotation rate of 4 Hz) that measures bearing to a set of beacons. Because of the low quality of sensor information, the vehicle localization system can meet the performance requirements only through the use of an accurate process model. The model that was developed is nonlinear and incorporates kinematics, dynamics, and slip due to tire deformation. It also contains a large number of process noise terms. This model is extremely cumbersome to work with, but the UF obviates the need to calculate Jacobians, greatly simplifying its use.

The model of vehicle motion is developed from the two-dimensional fundamental bicycle, which is shown in Figure 15.6.^{34, 35 and 36} This approximation, which is conventional for vehicle ride and handling analysis, assumes that the vehicle consists of front and rear virtual wheels.* The vehicle body is the line FGR with the front axle affixed at F and the rear axle fixed at R. The center of mass of the vehicle is located at G, a distance a behind the front axle and b in front of the rear axle. The length of the wheel base is B = a + b. The wheels can slip laterally with slip angles α_f and α_r, respectively. The control inputs are the steer angle, δ, and angular speed, ω, of the front virtual wheel.

Images

FIGURE 15.6
The actual experimental vehicle and the fundamental bicycle model representation used in the design of the vehicle process model. (a) The host vehicle at the test site with sensor suite. (b) Vehicle kinematics.

The filter estimates the position of F, (X_F, Y_F) the orientation of FGR, ψ, and the effective radius of the front wheel, R, (defined as the ratio of vehicle velocity to the rotation rate of the front virtual wheel). The speed of the front wheel is V_F, and the path curvature is ρ_F. From the kinematics, the velocity of F is

\begin{matrix} {\dot{X}}_{F} & = V_{F} cos [ψ + δ - α_{f}] & ρ_{F} = \frac{sin [δ - α_{f}] + cos [δ - α_{f}] tan α_{f}}{B} \\ {\dot{Y}}_{F} & = V_{F} cos [ψ + δ - α_{f}] & where V_{F} = R ω cos α_{f} \\ \dot{ψ} & = ρ_{F} V_{F} - \dot{δ} - {\dot{α}}_{f} \\ \dot{R} & = 0 \end{matrix}

$\begin{matrix} {\dot{X}}_{F} & = V_{F} cos [ψ + δ - α_{f}] & ρ_{F} = \frac{sin [δ - α_{f}] + cos [δ - α_{f}] tan α_{f}}{B} \\ {\dot{Y}}_{F} & = V_{F} cos [ψ + δ - α_{f}] & where V_{F} = R ω cos α_{f} \\ \dot{ψ} & = ρ_{F} V_{F} - \dot{δ} - {\dot{α}}_{f} \\ \dot{R} & = 0 \end{matrix}$

The slip angle (α_f) plays an extremely important role in determining the path taken by the vehicle and a model for determining the slip angle is highly desirable. The slip angle is derived from the properties of the tires. Specifically, tires behave as if they are linear torsional springs. The slip ankle on each wheel is proportional to the force that acts on the wheel³⁵

\begin{matrix} \begin{matrix} α_{f} = \frac{F_{y_{f}}}{C_{α_{f}}} & α_{r} = \frac{F_{y_{r}}}{C_{α_{r}}} \end{matrix} & (15.23) \end{matrix}

$\begin{matrix} \begin{matrix} α_{f} = \frac{F_{y_{f}}}{C_{α_{f}}} & α_{r} = \frac{F_{y_{r}}}{C_{α_{r}}} \end{matrix} & (15.23) \end{matrix}$

C_αf and C_αr are the front and rear wheel lateral stiffness coefficients (which are imprecisely known). The front and rear lateral forces, F_yf and F_yr, are calculated under the assumption that the vehicle has reached a steady state; at any instant in time the forces are such that the vehicle moves along an arc with constant radius and constant angular speed.³⁷ Resolving moments parallel and perpendicular to OG, and taking moments about G, the following simultaneous nonlinear equations must be solved:

\begin{matrix} ρ_{G} m V_{G}^{2} = F_{x_{f}} sin [δ - β] + F_{x_{f}} cos [δ - β] + F_{y_{r}} cos β, & β = {tan}^{- 1} (\frac{b tan [δ - α_{f}] - α tan α_{r}}{B}) \\ 0 = F_{x_{f}} cos [δ - β] - F_{y_{f}} sin [δ - β] + F_{y_{r}} sin β, & ρ_{G} = cos β \frac{tan [δ - α_{f}] + tan α_{r}}{B} \\ 0 = F_{x_{f}} sin δ + a F_{y_{f}} cos δ - F_{y_{r}}, & V_{G} = V_{F} cos [δ - α_{f}] sec β \end{matrix}

$\begin{matrix} ρ_{G} m V_{G}^{2} = F_{x_{f}} sin [δ - β] + F_{x_{f}} cos [δ - β] + F_{y_{r}} cos β, & β = {tan}^{- 1} (\frac{b tan [δ - α_{f}] - α tan α_{r}}{B}) \\ 0 = F_{x_{f}} cos [δ - β] - F_{y_{f}} sin [δ - β] + F_{y_{r}} sin β, & ρ_{G} = cos β \frac{tan [δ - α_{f}] + tan α_{r}}{B} \\ 0 = F_{x_{f}} sin δ + a F_{y_{f}} cos δ - F_{y_{r}}, & V_{G} = V_{F} cos [δ - α_{f}] sec β \end{matrix}$

m is the mass of the vehicle, V_G and ρ_G are the speed and path curvature of G, respectively, and β is the attitude angle (illustrated in Figure 15.6). These equations are solved using a conventional numerical solver²⁴ to give the tire forces. Through Equation 15.23, these determine the slip angles and, hence, the path of the vehicle. Since C_αf and C_αr must account for modeling errors (such as the inaccuracies of a linear force–slip angle relationship), these were treated as states and their values were estimated.

As this section has shown, a comprehensive vehicle model is extremely complicated. The state space consists of six highly interconnected states. The model is made even more complicated by the fact that that it possesses 12 process noise terms. Therefore, 18 terms must be propagated through the nonlinear process model. The observation models are also very complex. (The derivation and debugging of such Jacobians proved to be extremely difficult.) However, the UF greatly simplified the implementation, tuning, and testing of the filter. An example of the performance of the final navigation system is shown in Figure 15.7. Figure 15.7a shows a figure of eight route that was planned for the vehicle. This path is highly dynamic (with continuous and rapid changes in both vehicle speed and steer angle) and contains a number of well-defined landmarks (which were used to validate the algorithm). There is extremely good agreement between the estimated and the actual paths, and the covariance estimate (0.25 m² in position) exceeds the performance requirements.

15.8 Multilevel Sensor Fusion

This section discusses how the UT can be used in systems that do not inherently use a mean and covariance description to describe their state. Because the UT can be applied to such systems, it can be used as a consistent framework for multilevel data fusion. The problem of data fusion has been decomposed into a set of hierarchical domains.³⁸ The lowest levels, level 0 and level 1 (object refinement), are concerned with quantitative data fusion problems such as the calculation of a target track. Level 2 (situation refinement) and level 3 (threat refinement) apply various high-level data fusion and pattern recognition algorithms to attempt to glean strategic and tactical information from these tracks.

The difficulty lies in the fundamental differences in the representation and use of information. On the one hand, the low-level tracking filter provides only mean and covariance information. It does not specify an exact kinematic state from which an expert system could attempt to infer a tactical state. On the other hand, an expert system may be able to predict accurately the behavior of a pilot under a range of situations. However, the system does not define a rigorous low-level framework for fusing its predictions with raw sensor information to obtain high-precision estimates suitable for reliable tracking. The practical solution to this problem has been to take the output of standard control and estimation routines, discretize them into a more symbolic form (e.g., slow or fast), and process them with an expert/fuzzy rule base. The results of such processing are then converted into forms that can be processed by conventional process technology.

Images

FIGURE 15.7
The positions of the beacons can be seen in (a) and (b) as the row of ellipses at the top and bottom of the figures.

Images

FIGURE 15.8
A possible framework for multilevel information fusion using the unscented transformation.

One approach for resolving this problem, illustrated in Figure 15.8, is to combine the different data fusion algorithms together into a single, composite data fusion algorithm that takes noise-corrupted raw sensor data and provides the inferred high-level state. From the perspective of the track estimator, the higher-level fusion rules are considered to be arbitrary, nonlinear transformations. From the perspective of the higher-level data fusion algorithms, the UT converts the output from the low-level tracker into a set of vectors. Each vector is treated as a possible kinematic state, which is processed by the higher-level fusion algorithms. In other words, the low-level tracking algorithms do not need to understand the concept of higher-level constructs, such as maneuvers, whereas the higher-level algorithms do not need to understand or produce probabilistic information.

Consider the problem of tracking an aircraft. The aircraft model consists of two components—a kinematic model, which describes the trajectory of the aircraft for a given set of pilot inputs, and an expert system, which attempts to infer current pilot intentions and predict future pilot inputs. The location of the aircraft is measured using a tracking system, such as a radar.

Some sigma points might imply that the aircraft is making a rapid acceleration, some might indicate a moderate acceleration, and yet others might imply that there is no discernible acceleration. Each of the state vectors produced from the UT can be processed individually by the expert system to predict various possible future states of the aircraft. For some of the state vectors, the expert system will signal air evasive maneuvers and predict the future position of the aircraft accordingly. Other vectors, however, will not signal a change of tactical state and the expert system will predict that the aircraft will maintain its current speed and bearing. The second step of the UT consists of computing the mean and covariance of the set of predicted state vectors from the expert system. This mean and covariance gives the predicted state of the aircraft in a form that can then be fed back to the low-level filter. The important observation to be made is that this mean and covariance reflect the probability that the aircraft will maneuver even though the expert system did not produce any probabilistic information and the low-level filter knows nothing about maneuvers.

15.9 Conclusions

This chapter has described some of the important issues arising from the occurrence of nonlinear transformations in practical data fusion applications. Linearization is the most widely used approach for dealing with nonlinearities, but linearized approximations have been shown to yield relatively poor results. In response to this and other deficiencies of linearization, a new technique based on the UT has been developed for directly applying nonlinear transformations to discrete point distributions having specified statistics (such as mean and covariance). An analysis of the new approach reveals that

The UT is demonstrably superior to linearization in terms of expected error for all absolutely continuous nonlinear transformations. The UT can be applied with nondifferentiable functions in which linearization is not possible.
The UT avoids the derivation of Jacobian (and Hessian) matrices for linearizing nonlinear kinematic and observation models. This makes it conducive to the creation of efficient, general-purpose black box code libraries.
Empirical results for several nonlinear transformations that typify those arising in practical applications clearly demonstrate that linearization yields very poor approximations compared to those of the UT.

Beyond analytic claims of unproved accuracy, the UT offers a black box solution to a wide variety of problems arising in both low- and high-level data fusion applications. In particular, it offers a mechanism for seamlessly integrating the benefits of high-level methodologies, such as AI, fuzzy logic, and neural networks, with the low-level workhorses of modern engineering practice, such as covariance intersection and the KF.

Acknowledgments

The authors gratefully acknowledge support from IDAK Industries and the University of Oxford.

References

1. Julier, S.J. and Uhlmann, J.K., Unscented filtering and nonlinear estimation, Proceedings of the IEEE, 92(3), 401–422, 2004.

2. Julier, S.J. and Uhlmann, J.K., A nondivergent estimation algorithm in the presence of unknown correlations, American Control Conference, 4, 2369–2373, 1997.

3. Kalman, R.E., A new approach to linear filtering and prediction problems, Transactions of the ASME, Journal of Basic Engineering, 82, 34–45, 1960.

4. Kushner, H.J., Dynamical equations for optimum nonlinear filtering, Journal of Differential Equations, 3, 179–190, 1967.

5. Jazwinski, A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, NY, 1970.

6. Maybeck, P.S., Stochastic Models, Estimation, and Control, Vol. 2, Academic Press, New York, NY, 1982.

7. Daum, F.E., New exact nonlinear filters, Bayesian Analysis of Time Series and Dynamic Models, J.C. Spall, Ed., Marcel Drekker, New York, NY, 199–226, 1988.

8. Gordon, N.J., Salmond, D.J., and Smith, A.F.M., Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEEE Proceedings F, 140(2), 107–113, 1993.

9. Kouritzin, M.A., On exact filters for continuous signals with discrete observations, IEEE Transaction on Automatic Control, 43(5), 709–715, 1998.

10. Uhlmann, J.K., Algorithms for multiple target tracking, American Scientist, 80(2), 128–141, 1992.

11. Sorenson, H.W., Ed. Kalman Filtering: Theory and Application, IEEE Press, New York, NY, 1985.

12. Athans, M., Wishner, R.P., and Bertolini, A., Suboptimal state estimation for continuous time nonlinear systems from discrete noisy measurements, IEEE Transactions on Automatic Control, TAC-13(6), 504–518, 1968.

13. Costa, P.J., Adaptive model architecture and extended Kalman–Bucy filters, IEEE Transactions on Aerospace and Electronic Systems, AES-30(2), 525–533, 1994.

14. Austin, J.W. and Leondes, C.T., Statistically linearized estimation of reentry trajectories, IEEE Transactions on Aerospace and Electronic Systems, AES-17(1), 54–61, 1981.

15. Mehra, R.K. A comparison of several nonlinear filters for reentry vehicle tracking, IEEE Transactions on Automatic Control, AC-16(4), 307–319, 1971.

16. Viéville, T. and Sander, P., Using pseudo Kalman filters in the presence of constraints application to sensing behaviours, Technical report, INRIA, April 1992.

17. Dulimov, P.A., Estimation of ground parameters for the control of a wheeled vehicle, Master’s thesis, The University of Sydney, 1997.

18. Wolfram, S., The Mathematica Book, Wolfram Research, 4th Edition, Cambridge, MA, 1999.

19. Redfern, D., Maple V Handbook—Release 4, 2nd Edition, Ann Arbor, MI, 1996.

20. Uhlmann, J.K., Simultaneous map building and localization for real-time applications, Technical Report, Transfer thesis. University of Oxford, 1994.

21. Julier, S.J., Uhlmann, J.K., and Durrant-Whyte, H.F., A new approach for the nonlinear transformation of means and covariances in linear filters, IEEE Transactions on Automatic Control, 45(3), 477–482, 2000.

22. Kushner, H.J., Approximations to optimal nonlinear filters, IEEE Transactions on Automatic Control, AC-12(5), 546–556, 1967.

23. Holtzmann, J., On using perturbation analysis to do sensitivity analysis: Derivatives vs. differences, IEEE Conference on Decision and Control, 37(2), 2018–2023, 1989.

24. Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P., Numerical Recipes in C: The Art of Scientific Computing, 2nd Edition, Cambridge University Press, New York, NY, 1992.

25. Julier, S.J., The spherical simplex unscented transformation, Proceedings of IEEE American Control Conference, 3, 2430–2434, 2003.

26. Julier, S.J. and Uhlmann, J.K., A consistent, debiased method for converting between polar and Cartesian coordinate systems, Proceedings of AeroSense: Acquisition, Tracking and Pointing XI, SPIE, Vol. 3086, pp. 110–121, 1997.

27. Julier, S.J., A skewed approach to filtering, Proceedings of AeroSense: The 12th International Symp. Aerospace/Defense Sensing, Simulation, and Controls, Vol. 3373, pp. 271–282. SPIE, 1998: Signal and Data Processing of Small Targets.

28. Tenne, D. and Singh, T., The higher order unscented filter, Proceedings of IEEE American Control Conference, 3, 2441–2446, 2003.

29. Lerro, D. and Bar-Shalom, Y.K., Tracking with debiased consistent converted measurements vs. EKF, IEEE Transactions on Aerospace and Electonics Systems, AES-29(3), 1015–1022, 1993.

30. Leonard, J., Directed Sonar Sensing for Mobile Robot Navigation, Kluwer Academic Press, Boston, MA, 1991.

31. Schmidt, S.F., Applications of state space methods to navigation problems, Advanced Control Systems, Leondes, C.T., Ed., Vol. 3, pp. 293–340, Academic Press, New York, NY, 1966.

32. Chang, C.B., Whiting, R.H., and Athans, M. On the state and parameter estimation for maneuvering reentry vehicles, IEEE Transactions on Automatic Control, AC-22(1), 99–105, 1977.

33. Julier, S.J., Comprehensive Process Models for High-Speed Navigation, Ph.D. thesis, University of Oxford, 1997.

34. Ellis, J.R., Vehicle Handling Dynamics, Mechanical Engineering Publications, London, UK, 1994.

35. Wong, J.Y., Theory of Ground Vehicles, 2nd Edition, Wiley, New York, NY, 1993.

36. Dixon, J.C., Tyres, Suspension and Handling, Cambridge University Press, Cambridge, UK, 1991.

37. Julier, S.J. and Durrant-Whyte, H.F., Navigation and parameter estimation of high speed road vehicles, Robotics and Automation Conference, 101–105, 1995.

38. Klein, L., Sensor and Data Fusion Concepts and Applications, SPIE, 2nd Edition, 1965.

* Researchers often (and incorrectly) claim that the KF can be applied only if the following two conditions hold: (a) all probability distributions are Gaussian and (b) the system equations are linear. The KF is, in fact, the minimum mean squared linear estimator that can be applied to any system with any distribution, provided the first two moments are known. However, it is the globally optimal estimator only under the special case that the distributions are all Gaussian.

* If the matrix square root A of P is of the form P = A^TA, then the sigma points are formed from the rows of A. However, for a root of the form P = AA^T, the columns of A are used.

* The matrix square root should be calculated using numerically efficient and stable methods such as the Cholesky decomposition.²⁴

* If correlations exist between the noise terms, Equation 15.21 can be generalized to draw the sigma points from the covariance matrix

P^{a} (k | k) = [\begin{matrix} P (k | k) & P_{x υ} (k | k) & P_{x υ} (k | k) \\ P_{υ x} (k | k) & Q (k) & P_{υ w} (k | k) \\ P_{w x} (k | k) & P_{w υ} (k | k) & R (k) \end{matrix}]

$P^{a} (k | k) = [\begin{matrix} P (k | k) & P_{x υ} (k | k) & P_{x υ} (k | k) \\ P_{υ x} (k | k) & Q (k) & P_{υ w} (k | k) \\ P_{w x} (k | k) & P_{w υ} (k | k) & R (k) \end{matrix}]$

Such correlation structures commonly arise in algorithms such as the Schmidt–Kalman filter.³¹

* Each virtual wheel lumps together the kinematic and dynamic properties of the pairs of wheels at the front and rear axles.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15 Data Fusion in Nonlinear Systems

Create new playlist

Sign In

Sign Up

Table of Contents for
15 Data Fusion in Nonlinear Systems