CHAPTER 6

LEAST-SQUARES ESTIMATION: MODEL ERRORS AND MODEL ORDER

The year of this writing, 2009, is the 200th anniversary of Gauss’s (1809) Theory of the Motion of Heavenly Bodies Moving About the Sun in Conic Sections. One might suspect that after 200 years there is little to be learned about least squares, and that applications of the method are so routine as to be boring. While literature on new theory may be sparse, applications of the theory can be challenging. This author still spends a considerable fraction of his professional time trying to determine why least-squares and Kalman filter applications do not perform as expected. In fact, it is not unusual to spend more time trying to “fix” implementations than designing and developing them. This may suggest that insufficient time was spent in the design phase, but in most cases the data required to develop a better design were simply not available until the system was built and tested. Usually the problems are due to incorrect modeling assumptions, severe nonlinearities, poor system observability, or combinations of these.

This chapter discusses practical usage of least-squares techniques. Of particular interest are methods that allow analysis of solution validity, model errors, estimate confidence bounds, and selection of states to be estimated. Specific questions that are addressed (at least partially) in this chapter include:

1. Is the solution valid?

2. How unique is the solution?

3. What is the expected total error in the state estimate taking into account not only errors in measurements, but also errors in model structure and in un-estimated parameters?

4. What set of estimated states produces results with the smallest total state error?

5. What subset of estimated states (reduced-order model [ROM]) meets accuracy and execution time requirements?

6. What set of estimated states produces a result that is most robust to errors in modeling assumptions?

7. Can the search for the “best” set of estimated states be automated?

8. What fit data span produces the best predictions?

9. Can the variance of the measurement noise be determined from fit residuals?

6.1 ASSESSING THE VALIDITY OF THE SOLUTION

When a least squares, Minimum Mean-Squared Error (MMSE), Maximum Likelihood (ML), or Maximum A Posteriori (MAP) solution has been obtained by some method described in the previous chapters, it is important to know whether the solution is valid and accurate. This is a particularly important step in the estimation process because anomalous measurements and modeling errors are always potential problems.

6.1.1 Residual Sum-Of-Squares (SOS)

For weighted least squares, MMSE/minimum variance, ML, and MAP estimates, the first step in the validation process should involve checking the cost function. As shown in Chapter 4, the expected sum of weighted measurement residuals c06ue001 (where c06ue002) for an optimally weighted, full-rank, non-Bayesian least-squares solution should be equal to the number of scalar measurements (m) minus the number of estimated states (n): mn. If the measurement errors r are Gaussian-distributed, c06ue003 should be χ2-distributed with degrees-of-freedom equal to mn. Hence the expected value is mn with variance 2(mn). When mn > 20, the distribution is approximately N(mn, 2m − 2n). Thus when

(6.1-1) c06e001001

where k is the desired level of significance (e.g., 4 − σ), the residuals are statistically “large,” and either the fitted model is invalid, the measurement noise level is incorrectly specified, or some outlying measurements have not been edited. Since the χ2 distribution is not symmetric when mn is small (e.g., <10), significance tests should use

c06ue004

where P is the incomplete gamma function. Even when r is not Gaussian-distributed, c06ue005 is often close to mn provided that E[r] = 0 and E[rrT] = R. Hence equation (6.1-1) can still be used as a rough check provided that anomalous measurements (outliers) are eliminated from the solution. More will be said about this topic later.

Even when c06ue006 is larger than c06ue007, this does not necessarily mean that the state solution c06ue008 is invalid. For example, consider a case with m = 1000, n = 10, and k = 4, which gives c06ue009. If the measurement noise standard deviations (square roots of R diagonals) were incorrectly specified by 8.6%, test equation (6.1-1) would fail. It is often difficult to define the measurement noise σ’s accurate within 10%, so in many cases failure of equation (6.1-1) simply implies that the noise σ’s are incorrectly scaled. Notice that if all diagonals of R are scaled by the same multiplier, the weighted least-squares solution c06ue010 will not change, although the computed state error covariance (HTR−1H)−1 will be incorrectly scaled by the same factor.

As discussed in Chapter 4, when Bayesian least-squares (MMSE) solutions are used with a priori state estimate and covariance specified realistically, the weighted sum of measurement and prior state estimate residuals, c06ue011, should have an expected value of m, and equation (6.1-1) should be modified accordingly.

6.1.2 Residual Patterns

The next step in the evaluation process should include plotting of measurement residuals (c06ue012) and/or normalized residuals (c06ue013) versus time (or whatever independent variable is used in the model). If the model structure is correct and the number of measurements is large compared with the number of unknowns, the residuals should not have significant patterns. The state estimate c06ue014 that minimizes the least-squares cost function tends to eliminate systematic patterns from the measurement residuals. That happens because patterns within the observable subspace of the model can be eliminated by adjustment of c06ue015, thus decreasing the least-squares cost function. However, c06ue016 will still be somewhat correlated (nonwhite) even for an optimal model. This is explained using equation (4.2-8) and recalling that c06ue017 if E[r] = 0:

(6.1-2) c06e001002

Thus c06ue018 will generally not be diagonal. To explore this further, we compute the covariance of c06ue019 where R = LLT. Matrix L = R1/2 may be the Cholesky factor of R, but this is not a requirement. (Usually R is diagonal so L is also diagonal. In many least-squares implementations measurements are normalized as L−1y to simplify processing, but this is not required for the following steps.) Thus

c06ue020

We now use the Singular Value Decomposition (SVD) to express

c06ue021

where S1 is an n × n diagonal matrix, U1 is an m × n column-orthogonal matrix, U2 is an m × (mn) column-orthogonal nullspace matrix, and V is an n × n orthogonal matrix. Substituting this in equation (6.1-2) yields

c06ue022

Notice that c06ue023, c06ue024, c06ue025, c06ue026, and c06ue027. However, c06ue028 and c06ue029 will generally not be diagonal, so c06ue030 will have nonzero off-diagonal elements even when R is diagonal. In other words, the residuals will be correlated. Evidence of correlation may appear as systematic patterns in the plotted residuals. Notice that the correlation is defined by the nullspace of the measurement model. When the number of measurements is equal to the number of unknowns, the nullspace does not exist and the residuals will be zero.

It should be noted that for Bayesian estimation (MMSE or MAP) with prior information weighted in the solution, correlations between measurement residuals can be somewhat larger than for the prior-free (nonrandom state) case discussed above.

To summarize, a valid weighted least-squares solution may still produce measurement residuals that are somewhat correlated, but signatures appearing in the residuals are not characteristic of any linear combination of estimated states. In practice, when model structure is correct and the number of measurements is significantly larger than the number of unknowns, it is usually difficult to detect patterns in measurement residuals. If systematic patterns are evident, one should consider the strong possibility that the model is not correct.

6.1.3 Subsets of Residuals

If the solution consists of several different types of measurements, or measurements from different sensors, it is good practice to compute residuals statistics (mean, variance, root-mean-squared [RMS], and min/max) for each measurement subset. This can sometimes reveal problems with particular types of measurements or sensors. However, beware: problems with one measurement subset can sometimes be caused by problems with modeling a different measurement subset. Furthermore, while the mean of all residuals must be zero for the solution to minimize the least-squares cost function, the residual means for individual subsets need not be zero, particularly if the sample size is small. Hence a large residual mean for a specific measurement subset does not necessarily imply that there is a modeling problem. Experience with similar problems is very helpful when validating a least-squares (or Kalman filter) solution.

6.1.4 Measurement Prediction

Another useful test for validating least-squares models involves predicting the measurements outside the data span used for fitting the model, or predicting to untested conditions if parameters other than time are more important in defining model limits. Prediction residuals often show model limitations more readily than fit residuals. Solutions for time span t0 to t1 can be used to compute residuals for time span t1 to t2, and vice versa. This concept can be extended to different types of measurements, or measurements from different sensors. For example, if a navigation problem uses both range and angular measurements, and it is possible to compute a solution using only range data, one should use that solution to compute angle residuals. If the residuals appear larger than expected or have significant patterns, then model errors may be suspected. Likewise a solution based on only angle data can be used to compute range residuals. Unfortunately it has been this author’s experience that redundancy of measurement types or sets is rare. If you have such a problem, take advantage of the redundancy for model validation.

6.1.5 Estimate Comparison

Finally, the state estimates should be checked for reasonableness. If the states values are physically impossible or very unlikely, modeling errors should be considered. When Bayesian estimation is used, compare the difference between the a priori and a posteriori state estimates normalized by the a priori standard deviation: c06ue031 for state j. If the state change is many times larger than the prior uncertainty, either the prior estimate, prior uncertainty, or the model is incorrect. Also notice the ratio of the a priori and a posteriori standard deviations: c06ue032. If the a posteriori standard deviation is much smaller than the a priori, the estimator believes that the information in the measurements is much greater than that in the prior estimate. Problems with the measurements or model are likely when the a posteriori state estimate is suspicious under these conditions. The state correlations computed from the a posteriori error covariance matrix may provide insight when several state estimates appear suspect. Additional insight on expected performance may be obtained by testing the estimator using simulated measurements generated using the same conditions and parameters as for the real data.

We now demonstrate some of these concepts using two examples. The first is a variant on the low-order polynomial model previously used, and the second is an actual orbit and attitude solution for the GOES-13 satellite.

Example 6.1: Fourth-Order Polynomial Model

The polynomial problem of Example 5.9 used 101 measurements modeled as

c06ue033

with time samples (ti) uniformly spaced from 0 to 1. The samples of random measurement noise ri were generated as N(0,1), and the random state xi values were also N(0,1). However, in this example the standard deviation of the xi samples was increased to 50 so that the signature of the polynomial is much larger than that of the noise, and easily recognized in plots. Figure 6.1 shows the least-squares fit measurement residuals for a case in which the measurements were simulated using a third-order polynomial (nt = 4), and the weighted least-squares estimator used the same polynomial order (n = 4). Hence measurement noise is the only error source in the estimator. The weighted residual SOS for this case is 110.30 with an expected value of 101 − 4 = 97. The difference of 13.3 is within the standard deviation (c06ue034) for the χ2 distribution. Notice that the residuals have a slight periodicity of about four samples (0.04 s), but otherwise appear zero-mean and random.

FIGURE 6.1: Fit measurement residuals for nt = n = 4 polynomial example.

c06f001

Figure 6.2 shows the location in the measurement residual covariance c06ue035 of all absolute value covariances greater than 0.05 for the nt = n = 4 case. Notice that except at the beginning and end of the data span, the residuals are nearly uncorrelated with adjacent samples: the computed (diagonal) variances are about 0.98 at the middle of the data span (column index 50) and adjacent sample covariances are 0.02. The variances only drop to 0.96 at 10 samples from the span ends. Near the beginning and end of the data span, the adjacent sample covariances are more significant (0.10 to 0.13) and are spread over more samples. This result suggests that patterns in the residuals will be minimal, except near the ends of the data span.

FIGURE 6.2: Measurement residual covariances >0.05 for nt = n = 4 polynomial.

c06f002

Figure 6.3 shows measurement residuals for a higher order case in which nt = n = 8. The weighted residual SOS for this case is 105.15 with an expected value of 101 − 8 = 93. Again the difference of 12.15 is within the standard deviation (c06ue036) for the χ2 distribution. The same measurement noise sequence was used for this case as for the n = 4 case, and the residuals look similar to those in Figure 6.1. Figure 6.4 shows the location in the measurement residual covariance matrix of all absolute value covariances greater than 0.05. Now the variances are slightly smaller (about 0.95 at the span middle) but covariances are slightly above 0.05 for several adjacent samples. The residuals for the first and last measurements are somewhat more correlated with residuals of nearby measurements.

FIGURE 6.3: Fit measurement residuals for nt = n = 8 polynomial example.

c06f003

FIGURE 6.4: Measurement residual covariances >0.05 for nt = n = 8 polynomial.

c06f004

Extrapolating for larger polynomial orders, we would expect the variances to continue dropping and the adjacent sample covariance magnitudes to increase as polynomial order is increased for a fixed number of measurements. That is the actual behavior: with n = 15, the maximum variances are about 0.91 and adjacent covariances are about −0.09, −0.08, −0.07, −0.07 for measurements offsets of 1 to 4, respectively. However, as the number of model states (n) continues to increase, eventually all covariances tend toward zero as n approaches m.

These two cases demonstrate residual behavior when measurement noise is the only model error. However, the primary reasons for plotting residuals are to verify that the measurement noise variance has been correctly defined (so that measurement weighting is appropriate), and to determine if the estimation model is capturing all important behavior of the actual data. To demonstrate a mis-modeling case, we set nt = 8 and n = 4. Figure 6.5 shows the residuals: the nonrandom pattern is clearly evident. The weighted residual SOS is 166.43, which is significantly different from the expected 97. The difference of 69.43 is 4.99 times as large as the expected standard deviation of c06ue037, which also indicates that a modeling problem is likely. To better understand the effects of the modeling error, Figure 6.6 shows the residuals with the measurement noise removed. The pattern will of course vary with different state sample values, but it is often true for a variety of problems that low-frequency residual patterns appear when systems are mis-modeled. One should look for long-period residual patterns when attempting to determine whether or not a model is valid. Alternately if the residuals appear mostly random with no significant systematic patterns but

c06ue038

FIGURE 6.5: Fit measurement residuals for nt = 8, n = 4 polynomial example.

c06f005

FIGURE 6.6: Fit measurement residuals for nt = 8, n = 4 polynomial example: no measurement noise.

c06f006

it may be suspected that the measurement noise variance R has been specified incorrectly, and scaling is necessary.

Finally we examine the ability of the models to predict outside the data span. Figure 6.7 shows the fit (0 ≤ t ≤ 1) and prediction (1 < t ≤ 2) residuals for the nt = n = 4, and nt = 8, n = 4 models. Notice that the residuals for both models are approximately the same magnitude during the fit interval, but the prediction residuals are more than two orders of magnitude larger when the model has the wrong state order. This clearly demonstrates the impact of model errors and the importance of calculating prediction residuals when evaluating estimation models. Unfortunately the behavior is not usually as dramatic for many real-world problems because the effective signal-to-noise ratio for many states is not nearly as high. Hence many mis-modeling effects can be partially modeled by a linear combination of estimated states. More will be said about this topic later.

FIGURE 6.7: Measurement fit and prediction residuals for nt = 4, n = 4 and nt = 8, n = 4 polynomials.

c06f007

Example 6.2: GOES-13 Orbit and Attitude Determination (OAD)

We now demonstrate model validation issues using actual GOES-13 measurement data. As explained in Section 3.4.9, the GOES I-P spacecraft are geosynchronous weather satellites located approximately on the equator at a fixed longitude. Measurements of imaging instrument angles to ground landmarks, angles to known stars, and ground-to-spacecraft ranges are used in 26-h OAD solutions. The estimated orbit and instrument misalignment attitude coefficients (Fourier series) are used to predict the orbit and instrument attitude of the spacecraft for the next 24 h. That prediction is used onboard the spacecraft to dynamically adjust the scan angles of the optical imaging instruments while scanning.

Table 6.1 lists the measurement residual (observed minus computed) statistics for a 26-h GOES-13 (GOES N) OAD solution computed using range, landmark, and star observations from 2006 day 268.90 to day 269.97. The spacecraft was in the “upright” yaw orientation at this time and OAD only included stars and landmarks from the Imager instrument (not from the Sounder). The weighted measurement residual SOS for this solution was 1687.9 for 1624 scalar measurements, with 114 states estimated in the least-squares solution. Hence

c06ue039

TABLE 6.1: OAD Fit Residual Statistics for Upright GOES-13 on 2006 Day 269: Range Bias = 0

c06t2022rnr

which is 3.24 times the normalized standard deviation (c06ue040) expected for a χ2 distribution with 1510 degrees-of-freedom. This is only slightly higher than the expected range for a properly modeled system, so there is no reason to believe that modeling problems exist. In examining Table 6.1, it is helpful to compare the residual statistics with the measurement noise standard deviations (1 − σ) assumed for the OAD fit. These 1 − σ values were set to 3 m for ranges, 7.48 east-west (EW) and 9.71 north-south (NS) microradian (µrad) for Imager star observations, and 7.30 (EW) and 7.40 (NS) µrad for Imager landmark observations. Hence only the EW star residual RMS is larger than the specified 1 − σ, and only by 13%. Furthermore, the mean residuals for all observation types except ranges are less than 16% of the specified noise 1 − σ’s. The mean range residual is 16% of the specified noise 1 − σ, but this is not particularly suspicious. The least-squares solution condition number (scaled) was only 4284, which indicates that solution observability is acceptable.

Figure 6.8 shows the fit and prediction range and visible landmark residuals for this OAD fit. Notice that the range residuals exhibit a strong downward trend with an added sinusoidal pattern. The visible landmark fit residuals appear to have little bias, but the EW prediction residuals have a −10 µrad bias and slight trend. Similar patterns appeared in every daily OAD solution when the spacecraft was upright. Even though the residual statistics do not suggest modeling problems, the residual patterns strongly indicate that some characteristics of the GOES-13 system are not modeled properly. This result demonstrates the importance of using multiple evaluation criteria.

FIGURE 6.8: (a) 26-h OAD fit/prediction range residuals for upright GOES-13: range bias = 0. (b) 26-h OAD fit/prediction landmark residuals for upright GOES-13: range bias = 0.

c06f008

Subsequent investigations attempted to determine types of modeling errors that could cause the observed range residual signature. Because the spacecraft is so far from earth, the only significant forces acting on the spacecraft are gravitational (earth, sun, moon), solar radiation pressure, and spacecraft thrusting during maneuvers. At that distance the gravitational forces are accurately modeled and can be eliminated as a possible modeling error. Solar radiation pressure was a possibility, but an error in the assumed spacecraft reflection coefficient would not cause the observed range residual pattern. It was further eliminated as an error source by estimating the coefficient in the OAD solution. Errors in modeling the thrust vector for momentum-dumping maneuvers during the daily housekeeping (HK) period were also a possibility, but this was eliminated using special 2-day OAD solutions centered on the HK time, where the maneuver velocity change vector (Δv) was estimated in OAD.

Systematic measurement errors were another possibility. Spacecraft or Imager angular biases affecting both stars and landmarks are estimated as coefficients of the Fourier attitude model, so this was not a possible source of bias error. Other types of systematic pointing errors that have repeatable patterns from day-to-day are also handled by the Fourier series. One type of instrument misalignment is not modeled by the five-parameter attitude model, but the modeling error caused by this omission is too small to produce the errors of Figure 6.8. Hence investigation switched focus to range biases.

Range biases are not adjusted in OAD for the previous series of GOES spacecraft (designated I-M prior to launch and 8–12 after launch) because the ground system calibrates ground loop delays, and the calibrated biases are removed from the range measurements before use in OAD. The calibrations are routinely performed whenever equipment in the signal path changes. This works well and it has not been necessary to estimate GOES-8, 9, 10, 11, or 12 range biases in OAD. Since the same ground equipment is used for ranging to GOES-13, and the spacecraft range delays are well calibrated, there was no reason to suspect a range bias for GOES-13. However, previous experience had shown that uncompensated range biases could produce range residual patterns similar to those of Figure 6.8. When range bias estimation was activated in GOES-13 OAD for the day 269 solution, the estimated bias was found to be +54.0 m, and the trends in both range and landmark residuals were nearly eliminated, as shown in Figure 6.9.

FIGURE 6.9: (a) OAD fit and prediction range residuals for upright GOES-13 with range bias estimated (+54.0 m). (b) OAD fit and prediction landmark residuals for upright GOES-13 with range bias estimated (+54.0 m).

c06f009

Unfortunately this is not the end of the story. Even though range bias adjustment eliminated the fit and prediction problem, no range bias adjustment was necessary when the spacecraft was operated in the “inverted” yaw orientation (Fig. 6.10), which occurred twice during GOES-13 Post Launch Testing (PLT). Since there is no known reason for the range bias to change with spacecraft orientation, this was suspicious. Furthermore, the spacecraft was intentionally drifted 15 degrees west in longitude during PLT, and the OAD-estimated range bias was found to increase linearly with longitude offset from the ground station longitude. Again there was no known physical reason for this change in range bias. An extensive investigation over many months was initiated to determine the source of the problem. It was eventually found that an EW bias offset between star and landmark observations of about 32 µrad could produce the same behavior as a range bias, and was invariant with spacecraft longitude. Each EW image pixel is 16 µrad, so the bias appeared to be 2 pixels. Ground and spacecraft processing of star and landmark observations was investigated to determine whether an implementation error could cause the EW bias to appear when GOES-13 was upright but not inverted. No processing error could be found. More information about these GOES-13 investigations is provided in Gibbs et al. (2008a) and Carr et al. (2008).

FIGURE 6.10: (a) OAD fit and prediction range residuals for inverted GOES-13 with range bias = 0. (b) OAD fit and prediction landmark residuals for inverted GOES-13 with range bias = 0.

c06f010

The next spacecraft in the GOES NOP series, GOES-14, was launched in June 2009. Large trends and prediction biases were also observed in GOES-14 range residuals for both spacecraft yaw orientations. Since it is difficult to separate effects of range and landmark biases using just GOES observations, it was requested that the National Aeronautics and Space Administration (NASA) track GOES-14 using the Deep Space Network (DSN). This provided an orbit solution that did not depend on operational GOES range, landmark, and star observations. Comparison of GOES-14 range measurements with ranges computed using the DSN orbit solutions verified that there was indeed a bias of about +50 m in the GOES-14 range measurements. Further investigation revealed that the measured transponder delays for GOES-13 and 14 had never been set in the ground equipment component that computes the two-way range measurements: the values used were defaults appropriate for GOES I-M spacecraft. Hence the error in one-way range measurements was +42.2 m for GOES-13 and +48.6 m for GOES-14.

When the proper transponder correction was applied, GOES-14 OAD solutions produced nearly unbiased range predictions for both spacecraft orientations. There was still evidence of a small landmark bias that probably changed with yaw orientation, but the error was small enough to be ignored. With the transponder correction properly set for GOES-13, the trend in the prediction range residuals is reduced, producing a mean bias of about −15 m when in the upright orientation at 105 degrees west longitude. (Fig. 6.8 was obtained at 89.5 degrees west longitude, where the range bias effect is smaller.) GOES-13 has not been operated inverted since the transponder correction was applied, but analysis using 2006 data shows that the range adjustment will not eliminate the difference in behavior between upright and inverted orientations. One possibility for a yaw-dependent landmark bias is uncorrected Imager fixed-pattern scan angle errors. This would produce a different bias in the two orientations because ground landmarks appear in a small set of geographic areas. However, this theory has not been verified and the exact cause of the bias is still unknown at the time of this writing.

Although the GOES system is more complex than most systems for which least-squares estimation or Kalman filtering is applied, the procedures used to investigate possible modeling problems are fairly routine and apply to most estimation problems. Hence this case study demonstrates a general approach for isolating modeling problems. When model problems are suspected, it is very helpful to know the residual signatures of various modeling errors: this topic will be discussed in Section 6.2.

In addition to the various residual tests discussed above, other tests provide insight on the expected performance of the estimation. One should always compute the condition number of the least-squares solution—appropriately scaled as discussed in Chapter 5—and be aware that the state estimate will be sensitive to both measurement noise and model errors when the condition number is large. The a posteriori state error standard deviations (c06ue041 for state c06ue042, where P is the inverse of the Fisher information matrix) provide guidance on the expected accuracy of the state solution. Correlation coefficients (ρij = Pij/(σiσj)) close to 1.0 indicate that linear combinations of states are difficult to separately estimate. More information on weakly observable combinations of states is provided by the eigenvectors of the information matrix (or equivalently the right singular vectors of the measurement matrix) corresponding to the smallest eigenvalues. Finally, when using Bayesian MMSE or MAP solutions, the ratio of a priori to a posteriori state standard deviations (c06ue043) allows assessment of how much information is provided by the measurements compared with the prior information.

6.2 SOLUTION ERROR ANALYSIS

6.2.1 State Error Covariance and Confidence Bounds

In Chapter 4 we noted that the inverse of the Fisher information matrix is a lower bound on the state estimate error covariance. Specifically,

(6.2-1) c06e002001

with J defined as the negative log probability density and c06ue044. For the maximum likelihood problem using the linear model y = Hx + r, where r is N(0,R) random noise, the negative log probability density is

c06ue045

so the error covariance bound is

(6.2-2) c06e002002

The same equation is also obtained when computing second-order error statistics of the weighted least-squares method (without considering the underlying probability density of r). For the Bayesian least-squares or MAP problem, the error covariance bound is

(6.2-3) c06e002003

where Pa is the error covariance of the a priori state estimate.

We now consider the meaning of P, often called the a posteriori or formal error covariance matrix. When r is zero-mean Gaussian-distributed, the state estimate errors will also be Gaussian, that is,

(6.2-4) c06e002004

When x is a scalar (n = 1), the probability that c06ue046 for a > 0 is c06ue047, where c06ue048 is the Gaussian distribution function. For example, the probability that c06ue049 is 0.683, 0.955, 0.997 for a = 1, 2, 3 respectively. When n > 1, the quantity in the exponent, c06ue050, is a χ2-distributed variable with n degrees-of-freedom. Hence values of c06ue051 for which c06ue052 has a fixed value represent contours in multidimensional space of constant probability. It is easily shown using eigen decomposition that c06ue053 is the equation of an n-dimensional ellipsoid. The state covariance is first factored as

(6.2-5) c06e002005

where M is the modal matrix of eigenvectors and Λ is a diagonal matrix of eigenvalues λ1, λ2,…, λn. Then c06ue054, where c06ue055. The constant χ2 contours are defined by

(6.2-6) c06e002006

which is the equation of an n-dimensional ellipsoid. Notice that the square roots of the eigenvalues are the lengths of the ellipsoid semimajor axes, and the eigenvectors (columns of M) define the directions of the principal axes. The mean value of c06ue056 should be n if the model is correct, so approximately 50% of the random samples of c06ue057 should lie within the ellipsoid defined by c = n. Exact probabilities or values of c for other cumulative probabilities can be computed from the χ2 distribution using Pr{zTΛ−1z < c} = P(n/2,c/2) where P is the incomplete gamma function.

Chi-squared analysis is frequently used to characterize the uncertainty in a least-squares or Kalman filter state estimate. Often the distribution for a given confidence level is summarized by confidence bounds or regions for each state variable, or for linear combinations of the variables. Risk analysis and system accuracy specification are two examples sometimes based on this approach. This type of analysis should be used with caution because it depends on knowledge of the error distributions. Recall that the inverse of the Fisher information matrix is a lower bound on the error covariance. For linear systems with Gaussian measurement errors where the model structure has been validated, the inverse information matrix accurately characterizes the solution accuracy. However, for highly nonlinear systems, systems with non-Gaussian errors, or models that are only approximations to reality, the computed covariance may significantly underestimate actual errors. More will be said about these issues later.

Example 6.3: Two-Dimensional (2-D) Constant Probability Contours

Use of the χ2 distribution for computing confidence limits can be demonstrated using a two-dimensional problem. Let the error covariance be

c06ue058

where the eigenvalues are 4 and 1, and c06ue059, c06ue060 are the eigenvectors. Hence the ellipsoid semimajor and semiminor axes are 2 and 1, respectively, with major and minor axes oriented at +45 degrees and +135 degrees with respect to the x-axis. Using the incomplete gamma function for 2 degrees-of-freedom, we find that c = 9.21 defines the ellipse corresponding to 99% cumulative probability, and c = 2.30 defines the ellipse for 68.3% cumulative probability. (Note that 68.3% corresponds to the Gaussian 1 − σ limit for a single variable.) Figure 6.11 shows the 99%ile and 68.3%ile ellipses, with 200 random samples generated to have the above covariance. (A Gaussian random number generator produced N(0,4) samples of z1 and N(0,1) samples of z2. Then the x samples were computed as c06ue061.) Notice that all but one of the 200 samples lie within the outer ellipse, which is consistent with the 99%ile bound. Also notice that the 68.3% ellipse crosses the x1 and x2 axes at about ±1.93, which is 22% larger than the standard deviation computed from the square roots of the covariance matrix diagonals (c06ue062). Hence when samples are correlated, the confidence limits for each variable necessary to meet a given confidence level must be larger than the limits required for independent x1 and x2 samples. Further discussion of confidence limits may be found in Press et al. (2007, section 15.6.3) and many probability texts. Press et al. also discuss use of the χ2 distribution in non-normal cases, and the relationship between measurement fit error and the state estimate error confidence limits.

FIGURE 6.11: 2-D random samples, 68.3%ile, and 99%ile ellipses.

c06f011

Example 6.4: Satellite Collision Avoidance

Suppose that we want to know whether the probability of impact between two earth-orbiting satellites is greater than 1%. We are given the earth-centered-inertial position-velocity orbit elements of the two satellites at a fixed epoch time t0. The distance between the two satellites is then tabulated as a function of time by integrating the orbit elements. Let t1 be the time at which the distance is a minimum, although this is often not the point of maximum collision probability. To determine whether the probability of impact is significant, the three-dimensional (3-D) error covariance of the position difference between the two satellites and the sizes of each satellite must be known. To make the problem somewhat simpler, assume that the satellites have zero physical size and that the orbit error covariance of one satellite is much smaller than the other. In fact, assume that the orbit of satellite #2 is known perfectly, and thus it is only necessary to integrate the error covariance of satellite #1. The position error covariance of satellite #1 can be computed as

c06ue063

where Φ(t1t0) is the 6 × 6 position-velocity state transition matrix from time t0 to t1 and T = [I3×3 03×3]. The position difference at time t1 is designated as c06ue064.

Since the problem is 3-D, the mean value of the χ2-distributed variable c06ue065 should be 3. Tables of χ2 distributions for n = 3 show that only 1% of samples have c06ue066. Hence the relative positions of the two satellites at times near the closest approach can be used to compute c06ue067, and if that value exceeds 11.3, we can conclude that the probability of impact for the assumed conditions is less than 1%. A more accurate computation should also include uncertainty in the satellite #2 orbit, the size of the spacecraft, and the fact that large alongtrack errors may allow collisions at times slightly offset from predicted closest approach. Campbell and Udrea (2002) provide more information on the satellite collision avoidance problem.

For visualization purposes, 3-D ellipsoid contours for 1% probability can be computed by finding the c06ue068 values that yield c06ue069. One contour drawing method that produces visually appealing plots defines the plotted coordinates as

c06ue070

where c06ue071 is a unit vector in ellipsoid principal axes and c = 11.3 in our example. Since eigenvalue/eigenvector functions typically order the eigenvectors from largest to smallest, the first eigenvector is the ellipsoid major axis. Thus c06ue072 can be specified using fixed “slices” of c06ue073 (e.g., c06ue074) with the other two coordinates specified as c06ue075, c06ue076, and θ stepped in fixed increments (e.g., 0.2 radians). This approach works well even for high eccentricity ellipsoids. Woodburn and Tanygin (2002) provide more information on visualization of position error ellipsoids.

For illustrative purposes, we use the 3 × 3 position partition of the 6 × 6 position-velocity error covariance obtained from orbit determination of a low-earth-orbit satellite,
c06ue077
where position is specified in meters in true-of-date earth-centered-inertial coordinates. The modal matrix of eigenvectors for this covariance is

c06ue078

and the eigenvalues are

c06ue079

Figure 6.12 shows the 99%ile ellipsoid of this example, plotted as described. The major axis roughly corresponds to the alongtrack (velocity vector) of the satellite.

FIGURE 6.12: Satellite position error (m) 99% probability ellipsoid.

c06f012

The collision probability can also be visualized in two dimensions by projecting the 3-D ellipsoid into the plane normal to the relative velocity of the two spacecraft. If the distance between spacecraft is outside the 2-D ellipse, the probability of collision at that point is less that the threshold. The 2-D projections should also be computed for small time offsets from the time of closest approach because the alongtrack error is usually far larger than crosstrack or radial errors. Hence the closest approach is often not the point of maximum collision probability.

6.2.2 Model Error Analysis

We now consider modeling error effects that are not included in the formal covariance matrix of the previous section. There are three categories of modeling errors:

1. Model states included in the estimator are a subset of the true model states.

2. The model structure used in the estimator differs from the true model structure. This includes differences in both H and measurement noise characteristics (modeled in the estimator as fixed covariance R).

3. The estimator model is linear but the true model is nonlinear and linearization of the nonlinear model results in significant model errors.

The tools available for analysis of modeling errors include covariance analysis and Monte Carlo simulation. We address the error categories in reverse order because options for the latter two are limited.

6.2.2.1 System Nonlinearities

Most real-world systems are nonlinear at some level. The least-squares method is applied to these systems by linearizing the model about the current state estimate and then iterating (with new linearization at each step) using a Newton-like method. Nonlinear least-squares solution techniques are discussed in Chapter 7. System nonlinearities tend to have more impact on convergence of the iterations than on accuracy of the final solution because the iterations are stopped when the steps of c06ue080 are “small.” When measurement noise is large enough that the estimated state c06ue081 is far from the true x, nonlinearities of the model may cause c06ue082 to be larger than would be predicted from the formal error covariance.

Although the effects of nonlinearities could (in theory) be analyzed in a covariance sense using Taylor series expansions given approximate magnitudes of higher order terms, this is generally not practical. Hence nonlinear error effects are usually analyzed via Monte Carlo simulation where the simulated measurements are based on the nonlinear model and the estimate is obtained using a linearized model.

6.2.2.2 Incorrect Model Structure

When the estimator model has a structure different from that of the true model, either Monte Carlo simulation or covariance analysis can be used to analyze error behavior. However, the analysis is difficult because of fundamental inconsistencies in the models, as we now demonstrate. Let the true model for measurements used in the least-squares fit be

(6.2-7) c06e002007

where xt has nt elements. Subscript “f” denotes “fit” measurements because we will later use “p” to indicate prediction measurements. As before, the estimator model is

c06ue083

where x has n elements. The Bayesian least-squares estimate is

(6.2-8) c06e002008

where xa and Pa are the a priori estimate and error covariance, respectively, c06ue084, and

c06ue085

is the calculated a posteriori state error covariance. Note that P does not characterize the actual estimate errors when Hf, Pa, or Rf do not match reality.

When the model structure is correct, a “true” version (x) of the estimator state c06ue086 will exist and it is possible to compute the covariance of the estimate error c06ue087. However, when the model structure of Hf and c06ue088 are incorrect, a “true” version of the model state c06ue089 does not exist. Rather than computing the covariance of c06ue090, we must be content with characterizing the error in either the fit measurements or measurements not used in the fit (predicted measurements). The fit measurement residuals are

(6.2-9) c06e002009

and the residual covariance matrix is

(6.2-10) c06e002010

Although xt and xa will generally have different dimensions, it is quite possible that xt and xa will have subsets of states in common. Hence it may be possible to specify portions of

c06ue091

and unknown off-diagonal partitions could be reasonably set to zero. Notice that there is potential for significant cancellation in the last term of equation (6.2-10), so the fit residual covariance may be only slightly larger than the measurement noise (first) term.

For measurements not included in the fit (e.g., predicted), defined as

(6.2-11) c06e002011

the residuals are

(6.2-12) c06e002012

where Hp is the estimator model of predicted measurement sensitivity to estimator states. Assuming that c06ue092, the covariance is

(6.2-13) c06e002013

There is much less potential for error cancellation of predicted measurements in equation (6.2-13), so the prediction residuals may be much larger than fit residuals when model errors are present. The predicted measurement residuals are usually of more interest than fit residuals because the goal of estimation problems is often either to use the model for insight on parameter values, or to predict linear combinations of states under conditions different from those of the fit.

Equations (6.2-9) to (6.2-13) are based on the assumption that the Bayesian estimator equation (6.2-8) is used. If a nonrandom model is assumed and the prior information of equation (6.2-8) is removed, equations (6.2-9) and (6.2-12) simplify to

(6.2-14) c06e002014

for the fit residuals and

(6.2-15) c06e002015

for the prediction residuals where c06ue093. The measurement residual covariance equations are modified accordingly.

While the covariance equations (6.2-10) and (6.2-13) show how differences in models impact measurement residuals, they are of limited usefulness because many of the inputs such as Htf, Htp, E[rtfrtf], E[rtprtp], and

c06ue094

may not be known. Htf is usually only an approximation to reality: if it is well known the states in the estimator can often be modified so that Hf = Htf and thus modeling errors due to this term are eliminated. Hence mis-modeling analysis is generally of most interest when the true models are poorly known. A further problem is that equations (6.2-10) and (6.2-13) do not provide much guidance on the magnitudes or general characteristics of the modeling error because the behavior depends on differences in quantities that are poorly known.

So how can the above equations be used? One approach hypothesizes a specific type of “true” model and then plots the residual signatures using equations (6.2-9) and (6.2-12) without the measurement noise terms. This is similar to a “residual derivative” approach. If similar signatures appear when processing actual measurements, one can speculate that the true model is similar to the hypothesized model and the estimator model can be improved. Whether or not this is less work than simply testing different models in the estimator is debatable, but it can provide insight about model structure.

Even in cases when sufficient information is available to compute the covariance equations (6.2-10) and (6.2-13), it is often less work to use Monte Carlo simulation than covariance analysis.

6.2.2.3 Unadjusted Analyze (Consider) States

In some cases, states included in the estimator are a subset of states available in a high-accuracy model. That is, a high-accuracy state model may have p states but only n < p states are included in the estimator state vector. This is a Reduced-Order Model (ROM). Use of a ROM may be desirable either to achieve optimum accuracy (when some states are negligibly small), to make the estimator more robust, or to reduce the execution time of the estimator. In some cases a subset of states is determined using multiple large sets of measurements, and then those estimated states are treated as known when an estimator is used on a small set of new data. This may be done because the unadjusted (fixed) states are not observable solely from the limited number of new measurements. In any case, errors in the values of unadjusted states will cause errors in the adjusted states.

There are two reasons for analyzing the effects of the errors in unadjusted states on the estimate of the adjusted states: to compute a total error budget and to optimize selection of states to be included in the estimator. We assume that the states can be partitioned such that the fit measurement model is

(6.2-16) c06e002016

where the x1 states are adjusted in the estimator. The prior estimate x2a of the x2 states is used to compute the residual c06ue095, but x2a is not adjusted. The x2 states are often called consider parameters because they are assumed to be constants, and uncertainty in their values is “considered” when computing the error covariance for c06ue096. Using equation (6.2-8) for a Bayesian solution, the estimate c06ue097 is computed as

(6.2-17) c06e002017

where c06ue098, c06ue099, and

(6.2-18) c06e002018

A similar equation for c06ue100 can be computed using a non-Bayesian weighted least-squares solution by eliminating the c06ue101 and c06ue102 terms, but the xa2 term is still needed to compute the effects of errors in xa2 on c06ue103. The reason for assuming a Bayesian solution will be seen shortly when using equation (6.2-17) and related equations to optimize selection of adjusted states.

Defining c06ue104 and c06ue105, then

(6.2-19) c06e002019

and

(6.2-20) c06e002020

Similarly using equation (6.2-19),

c06ue106

and the joint covariance for both terms is:

(6.2-21) c06e002021

Note that P1 will overestimate the error when the filter order is greater than the true order (n > p) because Hf1 then includes terms that are actually zero. This property will be useful when attempting to determine the true order of the system.

Equations (6.2-19) and (6.2-21) show the impact of errors in unadjusted parameters on errors in the adjusted parameters. The equations are fundamental for both error analysis and optimal selection of adjusted states. The latter topic is addressed in Section 6.3. Equation (6.2-20) or the upper left partition of equation (6.2-21) is often called the consider or unadjusted analyze covariance of the adjusted states. This type of covariance analysis has been used for orbit determination since the early days of the space program. Typical consider parameters used in orbit determination are tracking station position errors, atmospheric refraction coefficients, and gravity field coefficients. These parameters are usually determined using measurements from hundreds or even thousands of satellite data arcs. Parameters that may be typically either adjusted or treated as consider parameters include solar radiation pressure coefficients, atmospheric drag coefficients, and tracking station biases.

Since accumulation of the normal equation information arrays is usually the most computationally expensive step in the least-squares error computations, consider parameter analysis is often implemented by initially treating all parameters as adjusted when forming the information arrays. That is, an A matrix and b vector are formed by processing all measurements and prior as

(6.2-22) c06e002022

(6.2-23) c06e002023

The consider error covariance is obtained by first forming

(6.2-24) c06e002024

where c06ue107 and c06ue108. Then

(6.2-25) c06e002025

and

(6.2-26) c06e002026

Although the A and b arrays are shown partitioned into adjusted and consider parameters, that partitioning is not necessary when forming the arrays. Specific states may be selected for treatment as either adjusted or consider by extracting the appropriate rows and columns from the A and b arrays during the analysis step, reordering the array elements, and forming A11 and A12 as needed. Thus parameters may be easily moved from adjusted to consider or vice versa and the solution accuracy can be evaluated as a function of the selection.

Consider Analysis Using the QR Algorithm

Consider covariance analysis can also be implemented using the arrays produced by the QR algorithm. Recall from Chapter 5 that the QR algorithm uses orthogonal transformations T to zero lower rows of Hf and create an upper triangular matrix U:

c06ue109

or

c06ue110

where T is implicitly defined from elementary Householder transformations—it is never explicitly formed. Then the least-squares solution c06ue111 is obtained by back-solving c06ue112. As when using the normal equations, measurement and prior information processing proceeds as if all states are to be adjusted. Then when specific states are selected as consider parameters, they are moved to the end of the state vector and the corresponding columns of the U matrix are moved to the right partition of U. Unfortunately the resulting U is no longer upper triangular, so it is necessary to again apply orthogonal transformations to zero the lower partition. This second orthogonalization step does not change any properties of the solution—it simply reorders the states. The U, x, and z arrays are thus partitioned as

c06ue113

where x1 includes n adjusted states and x2 includes p consider parameters. To understand the error propagation it is necessary to examine partitioning of the orthogonal transforms. We partition the orthogonal T matrix into row blocks corresponding to partitioning of the states, where T1 is n × m, T2 is p × m, and T3 is (mnp) × m. Then the transformation on c06ue114 may be written as
c06ue115

Subtracting to form

c06ue116

and using

c06ue117

we find that

c06ue118

or

(6.2-27) c06e002027

This equation applies for any definition of c06ue119, not necessarily for the optimal solution c06ue120, so we use c06ue121 where c06ue122. Assuming that c06ue123 and E[rrT] = I (as required in the QR algorithm), the error covariance for the adjusted states is found to be

(6.2-28) c06e002028

where c06ue124 and c06ue125. Consider parameter analysis can be efficiently performed when using the QR method, but it is slightly more difficult to implement than when using the normal equations.

Residual Derivatives and Covariance

Systematic patterns in measurement residuals are often the only evidence of modeling problems. Hence knowledge of measurement residual sensitivity to errors in unadjusted states can be very helpful when analyzing these problems. Other useful information includes the expected total residual variance as a function of measurement index, time, and type. As before, we are interested in both measurement fit and prediction residuals. Using yf to designate fit measurements and yp to designate prediction measurements, the residuals for fit and prediction data spans are

(6.2-29) c06e002029

Hence the fit residual derivative is

(6.2-30) c06e002030

This can be useful when attempting to identify unmodeled parameters that can generate observed signatures in the residuals.

Since rf, c06ue126 and c06ue127 are assumed to be uncorrelated, equation (6.2-29) can be used to compute the measurement fit residual covariance as

(6.2-31) c06e002031

The scaled residual covariance (used in computing the least-squares cost function) is

(6.2-32) c06e002032

From equation (6.2-29) the measurement prediction residuals are

(6.2-33) c06e002033

and the prediction residual covariance is

(6.2-34) c06e002034

Notice that c06ue128 in equation (6.2-31) reduces the residual covariance, but c06ue129 in equation (6.2-34) increases the covariance. As expected, prediction residuals tend to be greater than fit residuals even when all states are adjusted. Also notice that the unadjusted state error (c06ue130) contribution to the fit and prediction residual covariance is positive-semi-definite. That is, errors in unadjusted states will always increase the residual variances—another unsurprising result.

The residual between the a posteriori state estimate c06ue131 and the a priori value xa1 is another component of the Bayesian cost function. That residual is:

(6.2-35) c06e002035

The covariance of that residual is

(6.2-36) c06e002036

and the contribution to the cost function is

(6.2-37) c06e002037

The above sensitivity and covariance equations are useful when analyzing the impact of unadjusted parameters on fit and prediction measurement residuals and prior estimate residuals. The residual sensitivity equation (6.2-30) and the last term in equation (6.2-33) are easily calculated using partitions of the total information matrix computed with all states adjusted. The residual covariances may be obtained either as shown, or using Monte Carlo simulation with repeated samples of both measurement noise and state “truth” values. Monte Carlo analysis involves less coding effort, but the covariance equations can be more easily used to analyze different modeling possibilities. These methods will be demonstrated in an example appearing after the next topic.

6.2.2.4 Estimating Model Order

The previous error analysis was based on the assumption that the estimator model order is smaller than the order of the “truth” model. “Truth models” may include states (parameters) that are negligibly small or—more generally—are nonzero but accuracy of the prior estimate is much better than modeled by the a priori covariance. Hence inclusion of these states in the estimator may make it overly sensitive to measurement noise. That is, the extra states may be adjusted by the estimator to fit measurement noise rather than true model signatures. This reduces measurement residuals but increases state estimate errors. Although there is little penalty for including extra states if the estimator is only used to “smooth” measured quantities, accuracy can be significantly degraded if the state estimates are used to predict measurements or other quantities derived from the measurements. For these reasons it is important that the effective model order be accurately determined.

There are several methods that can be used to evaluate whether individual states should be included in the “adjusted” set. Perhaps the most obvious metric is the least-squares cost function. The fit measurement residuals (and thus the cost function) will decrease as unadjusted states are moved to the adjusted category. If the cost reduction when adding a given state is smaller than a yet-to-be-specified threshold, it may be concluded that the state should not be adjusted. Another useful metric is the log likelihood function, which for Gaussian models includes the least-squares cost function as the data-dependent component. A third metric is the variance of prediction residuals, which (as noted previously) is more sensitive to modeling errors than is the variance of fit residuals. Other metrics, such as the Akaike information criteria (Akaike 1974b), are also used and will be discussed in Chapter 12.

Least-Squares Cost Function

We start by analyzing the reduction in the least-squares cost function when a parameter is added to the state vector of adjusted states. It is assumed that prior estimates (possibly zero) and a nominal error covariance for all unadjusted states are available. The change in the least-squares cost function is used as a metric to determine whether the error in the prior estimate of a given state is sufficiently large to justify including it as an adjusted state, or whether the prior estimate is sufficiently accurate and the state should not be adjusted.

For notational simplicity, the cost function is expressed using a scaled and augmented m + n element measurement vector consisting of actual measurements and prior estimates of states:

c06ue132

where R = R1/2RT/2, c06ue133,

c06ue134

is an (m + n) × n matrix, and

c06ue135

is an m + n measurement error vector. Since the only measurements used in this section are fit measurements, the subscript “f” is dropped from y to simplify the notation.

Twice the Bayesian least-squares cost function is now written as

c06ue136

where the Bayesian estimate is

c06ue137

and the measurement/prior residual vector is
c06ue138

We now divide the adjusted states adjusted into two sets. States currently in the estimator are represented as x1, while x2 represents a single state that was initially unadjusted but is now added as an adjusted state located at the bottom of the state vector. This analysis only considers adding one state at a time to the adjusted states. For consistency, prior estimates of all unadjusted states must be used when computing measurement residuals. That is, measurements used for estimation of c06ue139 are computed as yHuxau where xau is the prior estimate of all unadjusted parameters (xu), and Hu is the sensitivity matrix of the measurements with respect to xu. If the same corrected measurements are used when testing whether x2 should be adjusted, then the Bayesian solution will estimate the correction δx2 to the prior estimate, where the prior estimate of c06ue140 is zero since the measurements have already been correction for the prior value xa2.

The corrected measurement vector is defined in terms of adjusted states as

(6.2-38) c06e002038

The augmented z vector consists of corrected measurements, prior estimates for the x1 states, and the prior estimate for the δx2 state as

c06ue141

We now combine the first two sets of z elements—since they are the measurements used when only x1 states are adjusted in the estimator—and denote those measurements as z1. The remaining prior measurement for δx2 is denoted as scalar z2 = 0, and the model for these measurements is

c06ue142

where

c06ue143

and

c06ue144

The least-squares information matrix and vector (denoted as A and b) are defined as

(6.2-39) c06e002039

and

(6.2-40) c06e002040

Hence the normal equations can be written as:

(6.2-41) c06e002041

where c06ue145, etc. Solving the lower equation and substituting it in the upper yields:

c06ue146

c06ue147

or

(6.2-42) c06e002042

where c06ue148 is the estimate of c06ue149 before state c06ue150 was added to the adjusted states. Then

(6.2-43) c06e002043

and the Bayesian least-squares cost function is

(6.2-44) c06e002044

where

c06ue151 is the error variance of the a priori estimate c06ue152, and
c06ue153 is the error variance of the a posteriori estimate c06ue154.

Hence the cost function will change by

c06ue155

when δx2 is moved from unadjusted to adjusted. That is, the cost function will drop by the square of the estimate magnitude divided by the estimate uncertainty. If that ratio is small, it can be assumed that δx2 is unimportant for modeling the measurements and can be eliminated from the estimator model; that is, x2 = xa2.

It is also of interest to determine the expected change in the cost function under the assumption that Pa2 accurately models the variance of c06ue156. The expected value of the cost function after adding adjusted state c06ue157 is

c06ue158

c06ue159 can be computed indirectly using

c06ue160

since c06ue161 from the orthogonality principle if x1 and x2 completely define the system. Since E[δx2] = 0 and c06ue162, then c06ue163 and the expected value of the cost function is

(6.2-45) c06e002045

Thus the expected change in the cost function is

(6.2-46) c06e002046

when state δx2 is moved to the adjusted states. The ratio c06ue164 is a measure of the information in the measurements relative to the prior information for state xi. When that ratio for the last state added is much greater than 1.0 (indicating that the a posteriori estimate of c06ue165 is much more accurate than the a priori estimate), the reduction in the cost function will be great when |δx2| > σ2.

The actual change from equation (6.2-44) for a specific set of measurements will be different. When

c06ue166

it may be concluded that the error in the prior estimate xa2 is sufficiently large to justify including δx2 as adjusted.

The Log Likelihood Function and Model Order

The negative log likelihood for a MAP estimator using Gaussian models was defined in Chapter 4 as:

(6.2-47) c06e002047

Notice that it includes a data-dependent part

c06ue167

that is equal to twice the Bayesian cost function for the fit,

(6.2-48) c06e002048

at the MAP estimate c06ue168 when c06ue169 and c06ue170. The ln pY(y) and ln|Pyy| terms are not a function of the data or the model order, so they can be ignored. The (m + n) ln(2π) and ln|Pxx| terms are a function of model order, but they are present in the Gaussian density function as normalization factors so that the integral of the density over the probability space is equal to 1.0. Both terms increase with model order, so one might suspect that the model order minimizing the negative log likelihood is optimal. However, inclusion of these terms results in selection of a model order that is lower than the actual order. Hence the negative log likelihood cannot be directly used to determine the correct model order.

The optimal model order can be determined using the data-dependent part of the likelihood function (2J) plus a “penalty” correction for the expected change in cost as parameters are changed from unadjusted to adjusted. That correction is obtained by summing the expected changes from equation (6.2-46) for all previously unadjusted states that are currently adjusted:

(6.2-49) c06e002049

where n1 − 1 is the “base” model order corresponding to the smallest model to be considered. Since the terms in the summation are negative, f will be positive and will tend to offset the reduction in 2J as unadjusted states are made adjusted. The minimum of the metric 2J + f defines the optimal model order. Use of this approach is demonstrated in Example 6.5.

Example 6.5: Consider Parameter Analysis and Selection of Polynomial Order

The polynomial problem of the previous chapter is used to demonstrate analysis of unmodeled parameters and optimal selection of model order. It may seem inconsistent to continue using polynomial examples when previous chapters have emphasized first-principles modeling and real-world examples, but there are good reasons for using basis functions to demonstrate selection of model order. If the inertial navigation system (INS) model of Section 3.3 had been used for this example, it would have been found that the SOS of fit and prediction measurement residuals jumped in groups and that the behavior was not monotonic. Recall that the INS model includes states in groups of three: tilt errors, accelerometer biases, accelerometer scale factors, gyro biases, gyro scale factors, gyro g-sensitive errors, and gyro g2-sensitive errors. While this grouping of states if often typical in physical systems, it would tend to obscure the relationship between estimator behavior and number of adjusted states. For that reason a basis function expansion is preferable for this example. Although the polynomial model is used here, the behavior of Chebyshev polynomials and Fourier expansions is somewhat similar.

This example again uses 101 simulated measurements of the form

c06ue171

with nt = 9 and time samples (ti) uniformly spaced from 0 to 1. The random measurement noise samples ri are N(0,1) and the simulated random state xi values are N(0,102). For the Bayesian least-squares estimator, the total model order is assumed to be 13 and the number of adjusted parameters (n) in c06ue172 is varied from 1 to 13; for example, when estimator n = 6, the number of x2 consider states is 7. The prior covariance is set as Pa = diag(102) to give the consider states the proper uncertainty for the simulated data. Both Monte Carlo and covariance analysis are used in the evaluation.

Figure 6.13 shows the covariance-based noise-only fit measurement residual standard deviations

c06ue173

FIGURE 6.13: Noise-only polynomial fit measurement residual 1 − σ versus order.

c06f013

computed from the first two terms in equation (6.2-31) as a function of estimator model order. Only three measurement time points are plotted: #1 at t = 0, #51 at t = 0.5, and #101 at t = 1. Since the plotted residual 1 − σ does not include the effects of unadjusted parameters, σyres_i should nearly equal c06ue174 when the number of adjusted parameters is small. For measurement #51 at the middle of the data span, σyres_i does not drop significantly as the model order is increased, but for the first and last measurements, σyres_i is smaller when more degrees-of-freedom are available for the fit model. It is interesting that σyres_i for the last measurement continues to drop with model order while σyres_i for the first point remains almost flat above n = 4. This behavior is due to the inclusion of prior information (Pa = diag(102)) on the states. When the prior information of the Bayesian solution is removed to produce a weighted least-squares solution, the behavior of the first and last points is similar. By weakly constraining the estimated polynomial coefficients using prior information, the modeled first measurement is more constrained than the last measurement because higher polynomial coefficients have less influence on the first point.

Figure 6.14 shows the total fit residual standard deviations computed from all terms in equation (6.2-31) as a function of estimator model order. (Because Rf = I, the results are unchanged when using equation [6.2-32].) Notice that the fit residuals are quite large when the model order is low, but they drop to approximately 1.0 at order 7 or 8. Since the simulated data were generated using order 9, the residuals should match those of Figure 6.13 when estimator n ≥ 9. Also plotted in Figure 6.14 is “2J + fm,” computed as

c06ue175

where c06ue176, c06ue177, and

c06ue178

FIGURE 6.14: Covariance-computed fit measurement residual 1 − σ versus order.

c06f014

penalizes the cost function for the expected change due to adding a state. The summation does not include the first three terms because the difference is not defined for i = 1, and the next two terms are dropped to allow plotting on the same scale as for the residuals.

The expected values above were computed using the covariance equations (6.2-32) and (6.2-37). The number of measurements (m) was subtracted from the plotted values to allow plotting on the same scale as the residual 1 − σ. Notice that “2J + fm” initially drops as order increases and has a minimum at n = 9 (the true n). For n < 9, the drop in 2J is much greater than the increase in f because the calculated f does not allow for additional unadjusted terms beyond the current order. For n > 9,

c06ue179

computed from equations (6.2-32) and (6.2-37) should be equal to zero because equations (6.2-32) and (6.2-37) do not include the reduction in residuals due to over-fitting the data. Hence the plotted change represents the effects of f.

Figure 6.15 is a similar plot based on 1000 samples of a Monte Carlo simulation where xa = 0; that is, the mean value of all simulated states is zero, and the prior value of all estimated states is also zero. Simulations using randomly selected xa elements produce curves almost identical to Figure 6.15. In this plot all variables are computed as sums of actual realizations of the random variables—as done when processing real data—and the plotted values are averages of the sample values. For example, the standard deviation of measurement residuals at time ti is computed as

c06ue180

where yi_j = h1ix1_jh2ix2_j + ri_j is the simulated noisy measurement at ti for Monte Carlo case j. Also 2J + fm is computed using equations (6.2-48) and (6.2-49).

FIGURE 6.15: Monte Carlo fit measurement residual 1 − σ versus order.

c06f015

The behavior in Figure 6.15 is very close to that of Figure 6.14, except that “2J + fm” increases less rapidly with order when n > 9 due to data “over-fitting” that reduces the residuals. Again the minimum occurs at n = 9, which is correct. When the prior covariance Pa is reduced to diag(12) rather than diag(102), the curves shift downward, but the minimum in “2J + fm” still occurs at n = 9.

To summarize, use of the cost metric 2J + fm appears to be a reliable indicator of true model order for this example.

Figure 6.16 shows the total a posteriori 1 − σ uncertainty for the first seven states, computed as the square roots of the diagonals of equation (6.2-21). Notice that the 1 − σ uncertainty for unadjusted states is equal to 10, but the uncertainty increases significantly when a state is first made adjusted because of the aliasing effect of other unadjusted states. The uncertainty then drops as other states are moved from unadjusted to adjusted.

FIGURE 6.16: Total a posteriori state 1 − σ uncertainty.

c06f016

Figure 6.17 shows the covariance-computed total measurement residual 1 − σ for predicted measurements at times 1.0, 1.5, and 2.0. The plot for t = 1.0 is identical to that of Figure 6.14, but on a different scale. However, for t = 1.5 the residual 1 − σ has a definite minimum at n = 9, and the minimum is quite pronounced for t = 2.0. Figure 6.18 is the comparable plot obtained from the Monte Carlo simulation. Again the two plots are quite similar, except that the covariance-computed errors are larger than actual errors for n > 9 because the covariance does not account for data over-fitting.

FIGURE 6.17: Covariance-computed prediction measurement residual 1 − σ versus order.

c06f017

FIGURE 6.18: Monte-Carlo prediction measurement residual 1 − σ versus order.

c06f018

As noted before, errors in model structure generally have more effect on an estimator’s prediction accuracy than on its ability to match data used in the fit. This behavior is somewhat general for different types of systems. Hence prediction (or nonfit) measurement residuals can often be a more sensitive indicator of model order than the least-squares cost function.

A similar analysis was conducted using Chebyshev orthogonal polynomial and Fourier series basis functions rather than ordinary polynomials. Since Chebyshev polynomials are only defined over a finite time interval, it was necessary to use 0 ≤ t ≤ 2 s as the interval so that the polynomials were reasonably bounded in the prediction interval. The Fourier series repeats rather than “blowing up” outside the fundamental period, so a fundamental period of 2 s was used to avoid repeating the function over the prediction interval 1 ≤ t ≤ 2. Although the “measurement residual versus order” plots were less smooth than those of Figures 6.15 and 6.18, and the prediction residuals were smaller, the general trends and behavior were almost the same as shown for ordinary polynomials. It may be expected that the results of this example are generally applicable to other problems, even though the residual plots may be less well behaved than shown.

Finally, we explain why it was desirable to use a Bayesian solution for the consider analysis. If all state parameters were treated as nonrandom, it would not be possible to compute the fit and prediction residual consider covariances equations (6.2-31) and (6.2-34) because Pa2 is a required input. Alternately, if the x1 states were treated as nonrandom but the x2 parameters were treated as random, results would be inconsistent as states were moved from the adjusted to unadjusted category. By treating all states as random with known a priori covariance, the treatment is consistent. In the next section we will show how stepwise regression allows the optimum model order to be selected when all parameters are assumed to be nonrandom.

Example 6.6: ROM of Optical Imaging Instrument Misalignments

This example demonstrates a slightly different approach to selecting states and optimal model order for a ROM. Rather than using a single metric to determine whether individual parameters should be adjusted in the estimator, a preliminary analysis is conducted using the SVD to determine the approximate model order necessary to accurately model angular measurements of an optical imaging instrument. Singular vectors from the SVD help identify the most important linear combinations of states. Then Monte Carlo analysis is used to evaluate accuracy in predicting pointing of image pixels on the earth—a required function of the operational system. A trial and error process—constrained by knowledge of approximate model order and important states—is used to determine the optimal choice of states to meet accuracy requirements.

One design for an optical earth-imaging instrument on a geosynchronous satellite uses two separate mirrors that can be independently rotated to vary the line-of-sight for the detector array. Rotation of one mirror allows the line-of-sight to scan in an EW direction, while the other mirror allows scanning in an NS direction. Various misalignments within the instrument cause the actual line-of-sight to deviate from the ideal. Although most misalignments are calibrated before launch, thermal deformation caused by solar heating changes the misalignments, so it is necessary to estimate misalignments from measurements available on-orbit. Spacecraft pointing error is another factor in total pointing error. In designing an estimator for this purpose, it is desirable that unobservable, nearly unobservable, or negligibly small misalignments not be included in the estimator state vector because they will increase the onboard computational load and potentially reduce accuracy. The following analysis demonstrates methods that can be used to determine the appropriate states needed for the system to meet pointing accuracy requirements.

The optical path from an individual detector to space passes through a telescope with reflections on a fixed mirror and two rotating mirrors. By following the path for a detector at the array center, allowing for misalignments, the optical line-of-sight unit vector in the Instrument Coordinate System (ICS) is found to be:

c06ue181

where

md2, md3 are angular misalignments of the detector array,
mem1, mem3 are mirror misalignments of the EW mirror with respect to the EW rotating shaft,
me1, me2, me3 are misalignments of the EW shaft axis with respect to the ICS,
mnm1, mnm3 are mirror misalignments of the NS mirror with respect to the NS rotating shaft,
mn1, mn2, mn3 are misalignments of the NS shaft axis with respect to the ICS,
SE = sin(2e), CE = cos(2e) where e is the rotation of the EW mirror from the reference position,
SN = sin(−2n), CN = cos(2n) where n is the rotation of the NS mirror from the reference position.

It is immediately seen that mem3 and me3 can be treated as a single parameter because they only appear as a sum. Likewise mnm1 and mn1 can also be treated as a single parameter. Hence a total of 10 parameters are needed to model internal instrument misalignments. Another three angles (roll, pitch, and yaw angles) are need to model the rotation of the ICS from the ideal spacecraft coordinate system—this rotation also includes spacecraft pointing error. Hence a total of 13 parameters are potentially needed to model total optical pointing. Computational load considerations make it desirable that about half that number of states be used, provided that pointing accuracy requirements can be met.

Angular measurements to known stars are used for computing misalignments. Mirror angles are adjusted to point near specific stars. With the spacecraft rotating in pitch to maintain earth-pointing, optical pointing angles to the star can be computed from the time at which the star is sensed in a particular detector. After preprocessing, the star measurements consist of an EW and NS pointing error for each star.

Seventeen star observations outside the earth disk were simulated for purposes of analyzing pointing errors. One extra star observation was intentionally located in the upper right quadrant so that the star distribution was not symmetric. Only about three stars would typically be observed within a period of a few minutes, but within an hour all quadrants would be covered with multiple observations. Figure 6.19 shows the location of simulated star positions and the earth in the optical field. Requirements specify pointing accuracy on the earth, so 17 “landmarks” uniformly distributed on the earth were used to compute pointing error statistics for evaluation purposes. Notice that because none of the star observation angles were within the earth’s radius, use of landmarks on the earth for evaluation is “prediction to untested conditions.” However, because the effects of misalignments tend to increase in magnitude with angular distance from nadir, pointing errors on the earth will generally be smaller than for star observations.

FIGURE 6.19: Simulated star observation angles relative to the earth.

c06f019

The first step in the analysis involves SVD computation of singular values and vectors for the 13 misalignment parameters when using 17 star observations. The SVD provides insight on the most important parameters for least-squares estimation. Table 6.2 lists the seven largest singular values (normalized as described in Section 5.7) and corresponding singular vectors. The mem3 and mnm1 parameters are not included in the analysis because they are not separable from me3 and mn1. From Table 6.2 we note the following:

1. Singular value magnitudes indicate that a minimum of four or five states will be required to accurately model the system. The maximum required model order is probably seven because singular values beyond that are 2 to 10 orders of magnitude smaller.

2. The third singular vector is the only one in which one parameter (yaw) is dominant. Other parameters with singular vector components greater than 0.5 are pitch, md3, me2, mnm3, and mn2. All dominant singular vectors except the third are a nearly equal combination of multiple parameters, with few parameters appearing strongly in multiple singular vectors.

TABLE 6.2: Largest Singular Values and Vectors for Attitude and Misalignments

c06t2362rob

The SVD analysis is helpful but not conclusive either to required model order or to the most important parameters (other than yaw). The physical parameters do not appear to have distinct signatures that allow a unique subset to mostly span the measurement space. However, because the parameters appear in multiple singular vectors, we cannot rule out the possibility that a subset can accurately model the measurements. Another option is to use the first five to seven singular vectors as the mapping between states of the estimation model and the physical parameters; that is, use estimation model states that are a transformation on physical parameters. However, system design issues make it desirable to retain the attitude states (roll, pitch, and yaw) as estimation model states.

There are three limitations of this SVD analysis:

1. It does not directly indicate which physical parameters should be used as ROM states. However, the example of this problem is unusually difficult.

2. It does not take into account the different magnitudes of the parameters. Instrument manufacturers perform extensive analysis of misalignment sources, calibration accuracy, and thermal deformation, and use that information to predict misalignment magnitudes. Mirror shaft angle misalignments can be an order of magnitude larger than mirror-to-shaft misalignments, and also much larger than detector location errors. This information is not taken into account in the SVD analysis. The different expected magnitudes can be used to scale the columns of the H matrix before computing the SVD, but this is also not conclusive.

3. It only models the error in fitting the star observation angles. It does not identify the combinations of parameters that allow the most accurate pointing for positions on the earth—the goal of the system.

The second analysis phase attempted to address these issues. A Monte Carlo simulation was used to provide more rigorous evaluation of various ROMs. Misalignment angles were randomly selected using values realistic for the class of instruments under consideration, but not representative of any particular instrument. Three sigma values ranged from 30 to 60 µrad for instrument attitude errors, 70 to 150 µrad for detector location errors, 50 to 240 µrad for mirror shaft angle errors, and 10 to 60 µrad for mirror misalignments. Random detector locations were also simulated, but random measurement noise was not added to simulated scan angles since this is irrelevant for the purpose of analyzing modeling errors. The estimated parameters always included roll, pitch, and yaw rotations, but the selected misalignment parameters were varied to determine the set that produced the smallest pointing residuals for “evaluation landmarks” uniformly distributed on the earth.

The maximum and RMS residual statistics for various combinations of estimated attitude parameters were tabulated. The results were

1. A five-parameter misalignment model (roll, pitch, yaw, md2, me2) can predict positions on earth with worst-case errors of 3.4 µrad EW and 2.1 µrad NS over the full range of detector locations.

2. Another five-parameter misalignment model (roll, pitch, yaw, mn3, me2) can predict positions on the earth with worst-case errors of 3.6 µrad EW and 1.6 µrad NS over the full range of detector locations.

3. A seven-parameter misalignment model (roll, pitch, yaw, mn2, mn3, me1, me2) can predict positions on the earth with worst-case errors of 0.1 µrad in both EW and NS over the full range of detector locations.

4. Compared with the five-parameter model, a six-parameter model reduces either the maximum EW or NS residuals, but not both.

Based on these results, it is concluded that a five-parameter misalignment model can meet pointing requirements if other error sources, such as random noise, are of nominal magnitudes. Another approach for selecting states of a ROM—stepwise regression—is demonstrated in a later example.

Prediction Residuals

Since prediction residuals are more sensitive to errors in model structure than fit residuals, the weighted variance of prediction residuals (or more generally measurement residuals not included in the fit) can be a better test of model order than the least-squares cost function. Of course this can only be done when the available data span is much larger than the span required for individual fits. Assuming that time is the independent variable of the system, the measurement data are broken into a number of time spans (possibly overlapping) where a least-squares fit is computed for each span. Each fit model is used to predict measurements for the next time span, and the sample variance of the prediction residuals is tabulated versus prediction interval and model order. The optimal order is the one yielding the smallest prediction variances. This technique is of most use when the purpose of the modeling is to either compute linear combinations of the states under conditions that were not directly measured, or when the state estimates are the direct goal of the estimation.

One practical issue associated with model order selection involves grouping of states to be adjusted in the estimator. Often several states must be grouped together for a meaningful model. For example, error sources often appear in three axes and there is no reason to believe that one or two axes dominate. In these cases the states should be tested as a group. Another issue is ordering of states to be adjusted if testing is performed sequentially. Different results can be obtained when sequential testing is performed in different orders. This topic will be discussed again in Section 6.3.

We now show how methods described in previous sections can be used to analyze errors and optimize selection of adjusted states.

6.3 REGRESSION ANALYSIS FOR WEIGHTED LEAST SQUARES

The consider parameter and model order selection analysis of the previous section was based on Bayesian estimation and random parameters. Standard regression analysis usually assumes that the solved-for states are nonrandom parameters and measurement noise variances are unknown. Hence solution methods are either unweighted least squares or weighted least squares where the assumed measurement noise variance is treated as a guess that can be scaled up or down. In the latter case, the scale factor is computed to make the measurement residual SOS equal to the χ2 distribution mean for the given number of degrees-of-freedom. We now discuss two aspects of regression analysis that provide insight on the modeling problem and allow optimal selection of model order.

6.3.1 Analysis of Variance

Recall that the cost function to be minimized in weighed least squares is c06ue182 where c06ue183 is the vector of a posteriori measurement residuals and W is a weighting matrix that is usually set equal to the inverse of the measurement noise covariance matrix R; that is, c06ue184. In unweighted least squares W = I. For the moment we ignore the weighting because it is not critical to the analysis. Using results from previous sections, the a posteriori unweighted measurement residual SOS can be written as

(6.3-1) c06e003001

using the normal equation solution c06ue185. In other words, the residual SOS is equal to the total SOS of measurements (yTy) minus a term that represents the ability of the estimated state c06ue186 to model to measurements: yTH(HTH)−1HTy. yTy is called the total measurement SOS. Since it is the sum of m squared random variables, it has m degrees-of-freedom. yTH(HTH)−1HTy is called the SOS due to the regression and can be expressed (using the SVD) as the sum of n (the number of states in x) squared random variables. Therefore it has n degrees-of-freedom. Finally c06ue187 is called the SOS about the regression and has m − n degrees-of-freedom. These sums, divided by the corresponding degrees-of-freedom, give mean-squared errors. For example, c06ue188 is the mean-square error about the regression. Table 6.3 summarizes these concepts. Further information on regression variance analysis may be found in Draper and Smith (1998).

TABLE 6.3: Analysis of Variance Summary

c06t2382rqa

The above analysis used whole values of measurements without removing means. However, analysis of variance is most frequently applied to variance about the mean. While a scalar sample mean (e.g., c06ue189) is often used in simple cases when one of the states is a measurement bias, it is not generally possible to find a state vector that exactly maps to a bias in the measurements; that is, it is not usually possible to find xb such that c06ue190 is exactly satisfied. The measurement mean of most interest is the mapping of the true state x into measurement space; that is, c06ue191 where as before, y = Hx + r and r is zero-mean measurement noise. Using this definition for c06ue192, the residual vector can be expressed as

c06ue193

where c06ue194. Hence

(6.3-2) c06e003002

rTr is the measurement SOS about the mean and rTH(HTH)−1HTr is the SOS about the mean due to the fit. Notice that (HTH)−1 times the measurement noise variance is the error covariance of the state estimate c06ue195, so rTH(HTH)−1HTr is the squared mapping of errors in c06ue196 to the measurement space. It is the part of rTr that can be removed by erroneous adjustments in c06ue197.

Since for zero-mean Gaussian r, c06ue198 should be χ2–distributed with m − n degrees-of-freedom and c06ue199 is an estimate of the variance about the regression, which should equal c06ue200 when using unweighted least squares. When weighting of c06ue201 is used, c06ue202 is an estimate of the scale factor required to correct c06ue203 to match the true measurement noise variance.

6.3.2 Stepwise Regression

We now show how previously discussed concepts can be used to determine appropriate parameters to be included in the regression model. While many approaches are available, stepwise regression is one of the most commonly used techniques.

Stepwise regression is automatic procedure used to determine a subset of model variables providing a good fit to observed data. It is primarily used when there are a large number of potential explanatory variables (e.g., patient age, sex, environment, and occupation for cancer epidemiology) and there is no underlying theory on which to base the model selection. Although based on statistical tests, the implementation is somewhat ad hoc and is generally used in cases when a physically based model is not available; it is used more often to model data correlations rather than to determine physical model parameters. Stepwise regression is not guaranteed to find the unique set of adjusted parameters minimizing the least-squares cost function. In fact, when given a large set of explanatory variables, it is possible to find multiple “optimal” sets with nearly the same cost function. The variables selected for inclusion in the model depend on the order in which the variables are selected because the sequential F-tests (on which selections are based) are assumed to be independent, which they are not. Neglecting “statistically small” parameters that are actually significant can bias the estimates. However, these problems tend to be less important for physically based models, when engineering judgment is applied to limit the number of variables and eliminate unrealistic solutions.

Alternative regression techniques attempt to search many combinations of parameters to find the set with the minimum norm (see Furnival and Wilson 1971; Hocking 1983; Draper and Smith 1998). When the number of possibilities is large, the computational burden of “search all subsets” techniques can be very high. Also these alternatives are subject to many of the same problems as stepwise regression. Draper and Smith still state a preference for stepwise regression over many alternate techniques.

Stepwise regression tests the statistical significance of adding or removing individual states from the adjusted set. Hence prior to discussing the algorithm, we first compute the change in the least-squares cost function when a state is added. The analysis is similar to that used for the Bayesian least-squares case, but is simpler without the prior estimate. We again partition the state vector, where x1 represents states that were previously included in the regression and x2 represents a scalar state that is conditionally added to the adjusted set. The weighted least-squares normal equations for this system are

(6.3-3) c06e003003

or more compactly

(6.3-4) c06e003004

where the substitutions are obvious. Solving the lower equation and substituting it in the upper yields:

c06ue204

Using these revised definitions of A and b, the solutions are the same as equations (6.2-42) and (6.2-43):

(6.3-5) c06e003005

and

(6.3-6) c06e003006

where

(6.3-7) c06e003007

is the estimate of c06ue206 before state c06ue207 is added to the adjusted states. Then the weighted residual SOS is

(6.3-8) c06e003008

where c06ue208. Hence the change in the cost function is c06ue209 when state x2 is adjusted. When R = E[rrT] and the true x2 = 0, c06ue210 is a χ2-distributed variable with one degree-of-freedom. Thus the χ2 distribution can be used to test the statistical significance of including c06ue211 as an adjusted state. More often stepwise regression assumes that c06ue212 where c06ue213 (the constant measurement noise variance) is unknown. Hence the solution uses unweighted least squares where c06ue214 is computed from the mean-squared error of the fit. In this case c06ue215 has an F-distribution, as explained below after first describing the stepwise regression algorithm.

Most implementations of stepwise regression are based on an algorithm by Efroymson (1960), which is a variation on forward selection. After a new variable is added to the regression, additional tests attempt to determine whether variables already in the regression can be removed without significantly increasing the residual SOS. The basic steps are

1. The sample information matrix and information vector are computed from the measurement data and measurement sensitivity matrix; that is, given the measurement equation y = Hx + r, where c06ue216, compute the unweighted least-squares information arrays A = HTH and b = HTy. Stepwise regression attempts to estimate the subset of x, c06ue217, of dimension p that minimizes the SOS of the residual c06ue218.

2. The A and B arrays are partitioned into the independent variables currently in the regression and the variables not in the regression. Let Ax and bx represent the partitions of A and B corresponding to variables in the regression. Also, the residual degrees-of-freedom, SOS, and mean square are computed as

c06ue219

(6.3-9) c06e003009

c06ue220

3. Estimates of states in the regression are computed as c06ue221, and the standard error for variable k, σk, is computed as the square root of the k-th diagonal of c06ue222. Then, the F-value for each variable k is computed as if it were the last variable added to the regression:

(6.3-10) c06e003010

If one or more independent variables in the regression have F-values less than the “F-to-remove” threshold, the one with the smallest F value is removed from the regression and control passes to step 2.

4. For each independent variable not in the regression, the tolerance value and F value are computed as if each independent variable is the only one added to the regression. This computation is implemented by means of a partitioned matrix inverse of A for each independent variable (see Dixon 1975). The tolerance value for a given variable is defined as the variance computed as if it is the only variable in the regression, divided by the variable variance computed as part of the regression with other variables. This is inversely related to the correlation between the variable and variables already in the regression. For variables not in the regression with tolerance greater than the tolerance threshold and with the F value greater than the “F-to-add” threshold, the variable with the highest F value is added and control passes to step 2.

5. The stepwise procedure is terminated if no variables are added or removed in previous steps.

The above procedure is a somewhat simplified description of basic stepwise regression algorithms included in statistical packages. Other computations and logic may be added to fine-tune the process or to use alternate techniques. Draper and Smith suggest that the “F-to-add” and “F-to-remove” thresholds be based on the same significance level α for the current degrees-of-freedom. They use α = 0.05 in their example, but note that some workers use a fixed number such as 4.0 for both “F-to-add” and “F-to-remove”.

It is instructive to examine the “F-statistic” c06ue223, which is algebraically equivalent to the ratio of two random variables:

(6.3-11) c06e003011

where c06ue224 is the measurement noise variance, and c06ue225 denotes the residual SOS computed using the estimated parameters c06ue226 obtained at regression iteration j. At step 4 of iteration j, equation (6.3-10) is executed for all parameters not currently in the regression and the reduction in Ss is computed for each parameter as if it is the only one added to the regression. Thus, for the null hypothesis of no additional jumps, the numerator of equation (6.3-11) is χ2-distributed with one degree-of-freedom. The denominator will also be χ2-distributed, but with df degrees-of-freedom at iteration j − 1. A ratio of two χ2 variables with 1 and df degrees-of-freedom, respectively, is modeled by the F(1,df) distribution. Thus the probability of false alarm for a single parameter (assuming the null hypothesis) can be calculated from tables of F-distributions. Unfortunately, this statistic will not be exactly F-distributed when multiple hypotheses are considered (see Draper and Smith 1998, section 15.2, or Dixon 1979). This same problem is also present when using the log likelihood ratio statistic of Generalized Likelihood Ratio (GLR) methods (see Kerr 1983 or Basseville and Beneviste 1986).

A hypothesis test based on equation (6.3-10) is often used in stepwise regression because the true measurement noise variance, c06ue227, is unknown. Since c06ue228 appears in both the numerator and denominator, it cancels and the test can be simply based on the ratio of residual SOS. In many applications the noise variance is well known, and thus a χ2 test on the numerator of equation (6.3-10) could be used. Under certain conditions, it can be shown that this is exactly equivalent to the log likelihood ratio statistic used in GLR (Gibbs 1992).

Stepwise regression is used as the basis of a Kalman filter jump detection algorithm presented in Chapter 11.

Example 6.7: Stepwise Regression for Optical Imaging Instrument

Here we use stepwise regression to continue the analysis of Example 6.6, which involved selection of a reduced set of misalignments for an optical imaging instrument. The basic stepwise regression algorithm is applied in each of 500 Monte Carlo trials using the 17 noiseless star observation pairs (north and east angles) to determine the best subset of states. The F-to-add threshold is set to 4, F-to-remove is set to 2, and tolerance is set to 0.001. This low tolerance threshold is necessary because the measurements are noiseless.

Table 6.4 shows the number of times each state was selected for two different conditions: with no states initially included in the regression, and with roll, pitch, and yaw initially included. Notice that only three parameters were chosen in more than 40% of samples when started with zero states in the regression. The result is very different when roll, pitch, and yaw are initially included: the regression retains roll, pitch, and yaw in 95% of the samples, and typically adds only one or two other parameters. In both cases mem1, me2, and mnm3 are frequently selected, but mn3 is frequently selected only when roll, pitch, and yaw are not initially included. Tests were also conducted with an F-to-remove of 0.5, and with measurement noise added: the results were not significantly different in character.

TABLE 6.4: Number of Times Parameters Were Included in Stepwise Regression

Parameter No. Times Selected with Regression Initialized Using Zero States No. Times Selected with Regression Initialized Using Roll, Pitch, Yaw
Roll 101 493
Pitch 121 479
Yaw 241 473
md 2 128 1
md 3 56 0
mem 1 200 137
me 1 178 2
me 2 182 134
me 3 125 2
mnm 3 181 136
mn 1 89 0
mn 2 156 93
mn 3 230 22

When the regression was initialized with zero states, a total of 44 different state sets were selected in the regression. Table 6.5 shows states, number of times the set was selected, and the measurement residual RMS for the seven most commonly selected sets. When the regression was initialized with roll, pitch, and yaw, a total of only 28 sets were used: Table 6.6 shows the results. Again notice the differences in selected states. The fit to the noiseless data, as measured by the residual RMS, is slightly better in Table 6.5, but the difference is small.

TABLE 6.5: Most Commonly Selected State Sets When Initialized with Zero States

c06t2442rrp

TABLE 6.6: Most Commonly Selected State Sets When Initialized with Three States

c06t2442rtz

This difference in behavior with different initialization demonstrates the non-uniqueness problem of stepwise regression for a case in which SVD analysis showed that many parameters are highly correlated. However, the problem of this example is not typical: the similarity of parameter signatures makes the problem unusually difficult. While this example demonstrates many reduced-order modeling issues, it may give a pessimistic impression of the approach. In many problems the dominant parameters can more easily be determined than demonstrated here. However, even with the difficulties of this case, it was possible to find low-order models that met earth-pointing requirements, as shown in Example 6.6.

6.3.3 Prediction and Optimal Data Span

When the ultimate goal of least-squares modeling is to predict system output for a period in the future, it can be important to determine the fit data span producing the best predictions. You may suspect that more data are better, and it is usually found that adding measurements does improve prediction accuracy when the estimator model accurately matches truth. However, when the estimator model does not exactly match truth—either because some states have been deliberately unadjusted or because the true model is not known—fit data spans much longer than the prediction span can sometimes degrade predictions because the batch estimator tends to weight long-term behavior greater than short term. On the other hand, fit data spans much shorter than the prediction span can also cause poor predictions because the estimator may miss important long-term behavior.

Again it is difficult to define general rules for specifying the optimal fit data span when model errors are present, but a good starting point is to make the fit span equal to the prediction span (if that is an option). Other factors that may be important are the ratio between the longest time constants (or basis function periods) and the fit and prediction spans, the measurement accuracy, and the stability of the system. (Is it really deterministic or is there a small stochastic component?) If the fit model is suboptimal because some parameters were deliberately unadjusted, then the prediction residual equation (6.2-33), covariance equation (6.2-34), or Monte Carlo simulation can be used to analyze the prediction accuracy as a function of the fit span. Otherwise evaluation on different segments of real data (if available) should be used.

6.4 SUMMARY

This chapter discussed practical usage of least-squares techniques. Specific topics included solution validation, model error analysis, confidence bound calculations, and optimal selection of states to be estimated.

There are many reasons why a least-squares solution may not be valid. Some of the more common problems include incorrect model structure, incorrect model assumptions, incorrect selection of model states or order, poor state observability, and anomalous measurements. Hence it is very important that solutions be validated. The methods used to validate the solutions include:

1. Check the least-squares cost function for weighted least squares, MMSE/minimum variance, ML, and MAP estimates. For weighted least squares, the weighted residual sum c06ue229 should be χ2-distributed with a mean value of m-n and variance equal to 2(m-n). For Bayesian estimates the sum should include the prior residuals and the degrees-of-freedom should be modified accordingly. When c06ue230 deviates significantly from m-n, model errors may be present or the assumed measurement noise variance may be wrong.

2. Plot the residuals to verify that systematic patterns are not significant. The residuals may be slightly correlated when the number of measurements is small, but systematic patterns should not be large.

3. Compute residual means and variances separately for each subset (sensor, measurement type, etc.) to verify that no one subset is anomalous.

4. When possible, compute separate solutions using different subsets of measurements and check the prediction accuracy for other measurement subtypes.

5. Use the model to predict measurements either past the time period used for the model fit or to untested conditions (using alternate test data), and check the magnitude and character of prediction residuals. Prediction residuals are generally more sensitive to model problems than fit residuals.

6. Compare the behavior of the estimator using real measurements with behavior obtained using simulated measurements. Is the behavior expected?

7. Check the state estimates for reasonableness. For Bayesian estimates, compare the change in a priori and a posteriori state values with the a priori standard deviation.

Two examples were given to demonstrate the concepts. The first was a fourth-order polynomial model. It was shown that the fit residuals are slightly correlated near the ends of the data span, but essentially uncorrelated near the middle. However, when measurements generated using an eighth-order polynomial are fit using a fourth-order model, systematic patterns in the fit residuals are readily evident. Also the prediction residuals diverge quickly when the model order is incorrect.

The second example was a case study of GOES-13 performance that demonstrated many of the analysis steps described above. It was noticed that 1-day predictions of range and landmark (but not star) measurements deviated significantly from actual measurements when the spacecraft was operated in the “inverted” yaw orientation. Analysis to determine the source of the problem included examination of fit and prediction residuals, comparisons of estimation results for different time spans and under different operating conditions, selection of different adjusted states, and testing using simulated data. The problem was found to be due to incorrect compensation for spacecraft transponder delay, and an EW bias in landmark or star measurements.

Section 6.2 of this chapter first showed how the a posteriori or formal state estimate error covariance—computed as the inverse of the Fisher information matrix—can be used to compute confidence limits on errors in the state estimate. Examples demonstrate plotting of error ellipses and contours for a given probability limit. The formal error covariance is a lower bound on the true estimate error covariance because it only includes the effects of measurement noise: nonlinearities and model errors are not included. Effects of nonlinearities are best analyzed via Monte Carlo simulation. The effects of incorrect model structure can be analyzed using either covariance or Monte Carlo analysis, but some of the required statistical inputs are likely to be poorly known. Error equations show that model structure errors are likely to have a larger effect on prediction residuals than fit residuals.

Covariance error analysis is better defined when a subset of the total states is adjusted in the estimator. The effects of the unadjusted analyze (consider) states on estimated states were computed in Section 6.2.2.3, and the total estimate error covariance was derived for both normal equation and QR solution methods. The sensitivity of fit and prediction residuals with respect to unadjusted states was also derived and used to compute the residual covariance. These equations are used to compute the expected total error in estimated states, and are also used when attempting to determine the optimal choice of states for an estimator model. Monte Carlo analysis can also be used for the same purpose.

It is generally undesirable to adjust states that have negligible effect on the measurements or have the same measurement signature as other linear combinations of adjusted states. Estimation of unnecessary states can degrade the accuracy of estimates and predictions, and increase computations. The weighted least-squares cost function is an important metric used when determining the optimal model order. Section 6.2.2.4 derived the expected change in the sum of weighted residuals when an unadjusted state is changed to adjusted in a Bayesian solution. The difference between the actual change and the expected change for nominal a priori variances can be used to determine the statistical significance of the state. A polynomial example demonstrated how fit and prediction residual statistics change with model order, and showed that the minimum of a modified log likelihood function occurs at the true model order used to generate the simulated measurements. It was also shown that the prediction residuals are more sensitive to incorrect order selection than fit residuals. Somewhat similar behavior was also observed when using Chebyshev polynomials or Fourier series models. Another example analyzed the effect of misalignments on pointing error of an optical imaging instrument, and demonstrated how the optimal selection of adjusted states is determined.

Unlike the Bayesian approach described above, standard regression analysis usually assumes that the solved-for (adjusted) states are nonrandom parameters, and that the measurement noise variance is unknown. The analysis of variance method computes the “residual about the mean” SOS due to the regression and about the regression, and compares those sums with degrees-of-freedom for each. An estimate of the measurement noise variance can be computed from the variance about the regression.

The concepts of the previous sections are used as the basis of stepwise regression methods for determining optimal selection of adjusted states in a non-Bayesian solution. Sequential F-tests and correlation tests are used to determine which unadjusted states are “statistically significant” and should be adjusted. Also at each step, currently adjusted states are tested for significance to determine if they should be removed from the regression. Unfortunately stepwise regression is not guaranteed to find the unique set of adjusted parameters minimizing the least-squares cost function, and different results can be obtained when the order in which variables are tested is changed. Alternate “search all subsets” methods are subject to some of the same problems as stepwise regression. This uniqueness problem is somewhat less of an issue with physical systems because engineering judgment can be used to limit the number of possible solutions. In practice stepwise regression often works well. However, an example based on the imaging instrument misalignment model showed the difficulty in using stepwise regression for a weakly observable problem.

Selection of the optimal fit data span to minimize prediction errors was the final topic. Use of fit data spans equal to the desired prediction data span is often a good starting point, but the optimal span in any particular problem depends on the dynamic model time constants, model errors, measurement accuracy, and inherent randomness (if any) of the system. Simulation should be used to investigate the trade-offs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.236.255