Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Detecting correlations among data

Abstract

Detecting Correlations among Data, develops techniques for quantifying correlations within data sets, and especially within and among time series. Several different manifestations of correlation are explored and linked together: from probability theory, covariance; from time series analysis, cross-correlation; and from spectral analysis, coherence. The effect of smoothing and band-pass filtering on the statistical properties of the data and their spectra is also discussed.

Keywords

covariance; correlation coefficient; autocorrelation; cross-correlation; coherence; delay; lag; ozone; taper; sidelobe

9.1 Correlation is covariance

When we create a scatter plot of observations, we are treating the data as random variables. The underlying idea is that two data types (or elements), say d_i and d_j, are scattering about their typical values. Sometimes the scatter is due to measurement noise. Sometimes it is due to an unmodeled natural process that we can only treat probabilistically. But in either case, we are viewing the cloud of data points as being drawn from a joint probability density function, p(d_i, d_j). The data are correlated if the covariance of this function is nonzero. Thus, the covariance matrix, C, is extremely useful in quantifying the degree to which different elements correlate. Recall that the covariance matrix associated with p(d_i, d_j) are defined as:

$C_{i j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} (d_{i} - {\bar{d}}_{i}) (d_{j} - {\bar{d}}_{j}) p (d_{i}, d_{j}) d d_{i} d d_{j}$ $C_{i j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} (d_{i} - {\bar{d}}_{i}) (d_{j} - {\bar{d}}_{j}) p (d_{i}, d_{j}) d d_{i} d d_{j}$

si5_e (9.1)

Here ${\bar{d}}_{i}$ ${\bar{d}}_{i}$ and ${\bar{d}}_{j}$ ${\bar{d}}_{j}$ are the means of d_i and d_j, respectively. We can estimate C_ij from a data set by approximating the probability density function with a histogram constructed from the observed data. We first divide the (d_i, d_j) plane into many small bins, numbered by the index s. Each bin has area Δd_iΔd_j and is centered at (d_i^(s), d_j^(s)) (Figure 9.1). We now denote the number of data pairs in bin s by N_s. The probability, p(d_i, d_j) Δd_iΔd_j ≈ N_s/N, where N is the total number of data pairs, so

$C_{i j} \approx \frac{1}{N} \sum_{s} [d_{i}^{(s)} - {\bar{d}}_{i}] [d_{j}^{(s)} - {\bar{d}}_{j}] N_{s}$ $C_{i j} \approx \frac{1}{N} \sum_{s} [d_{i}^{(s)} - {\bar{d}}_{i}] [d_{j}^{(s)} - {\bar{d}}_{j}] N_{s}$

si8_e (9.2)

f09-01-9780128044889 — Figure 9.1 Scatter plot pairs of data (circles) are converted into an estimate of the covariance by binning the data in small patches of the (d_i, d_j) plane, and counting up the number of points in each bin. The bins are numbered with an index, s.

We now shrink the size of the patches so that at most one data pair is in each bin. Then, N_s equals either zero or unity. Summation over the patches is equal to summation over the (d_i, d_j) pairs themselves:

$C_{i j} \approx \frac{1}{N} \sum_{k = 1}^{N} [d_{i}^{(k)} - {\bar{d}}_{i}^{}] [d_{j}^{(k)} - {\bar{d}}_{j}^{}]$ $C_{i j} \approx \frac{1}{N} \sum_{k = 1}^{N} [d_{i}^{(k)} - {\bar{d}}_{i}^{}] [d_{j}^{(k)} - {\bar{d}}_{j}^{}]$

si9_e (9.3)

The covariance is nonzero when the data exhibit some degree of correlation, but its actual numerical value depends on the overall range of the data. The range can be normalized to ±1 by scaling by the square root of the product of variances:

$R_{i j} = \frac{C_{i j}}{\sqrt{C_{i i} C_{j j}}}$ $R_{i j} = \frac{C_{i j}}{\sqrt{C_{i i} C_{j j}}}$

si10_e (9.4)

The quantity R is called the matrix of correlation coefficients, and its elements are called correlation coefficients and are denoted by the lower-case letter, r. When, as above, they are estimated from the data (as contrasted to being computed from the probability density function), they are referred to as sample correlation coefficients. See Table 9.1 for a list of important quantities, such as R, that are introduced in this chapter. The covariance, C, and correlation coefficient matrix, R, can be estimated from a set of data, D, as follows:

Table 9.1

Important Quantities Used in Chapter 9.

Symbol	Name	Created from	Significance
C_d	Covariance matrix of the data, d	Probability density function of the data, p(d)	Diagonal elements, [C_d]_ij with i = j: variance of the data, d_i; squared width of the univariate probability density function, p(d_i) off-diagonal elements, [C_d]_ij with i ≠ j: degree of correlation between the pair of observations, d_i and d_j
R	Matrix of correlation coefficients	Probability density function of the data, p(d)	Normalized version of C_d with elements that vary between ±1 elements of R given the symbol, r
$a = d ⋆ d$ $a = d ⋆ d$	Autocorrelaton function	Time series, d	Element a_k: degree of correlation between two elements of d separated by a time lag, τ = (k − 1)Δt
$c = d^{(1)} ⋆ d^{(2)}$ $c = d^{(1)} ⋆ d^{(2)}$	Cross-correlation function	Two time series, d⁽¹⁾ and d⁽²⁾	Element c_k: degree of correlation between an element of d⁽¹⁾ and an element of d⁽²⁾ separated by a time lag, τ = (k − 1)Δt
$f * d$ $f * d$	Convolution	Filter, f, and time series, d	Filters the times series, d, with the filter, f
$\tilde{d} (ω)$ $\tilde{d} (ω)$	Fourier transform	Time series, d(t)	Amplitude of sines and cosines of frequency, ω, in the time series
C²(ω₀, Δω)	Coherence	Two time series, d⁽¹⁾ and d⁽²⁾	Similarity between d⁽¹⁾ and d⁽²⁾ at frequencies in the range, ω₀ ± Δω varies between 0 and 1

t0010

C = cov(D); % covariance
R = corrcoef(D); % correlation coefficient

(MatLab eda09_01)

Here, D, is an N × M matrix organized so that D_ij is the amount of element, j, in sample i (the same arrangement as in Equation 8.1). The matrix, R, is M × M, so that R_ij expresses the degree of correlation of elements i and j. Figure 9.2A depicts the matrix of correlation coefficients for the Atlantic rock dataset, in which the elements are literal chemical elements. The diagonal elements are all unity, as a data type correlates perfectly with itself. Some pairs of chemical components, such as TiO₂ and NaO₂, strongly correlate with each other (Figure 9.2B). Other pairs, such as TiO₂ and Al₂O₃, are nearly uncorrelated.

f09-02-9780128044889 — Figure 9.2 (A) Matrix of absolute values of correlation coefficients of chemical elements in the Atlantic rock dataset. (B) Scatter plot of TiO₂ and Na₂O, the most highly correlated elements (r = 0.73). MatLab script eda09_01.

The idea of correlation can also be applied to the elements of a time series. Neighboring samples in a time series are often highly correlated (and hence predictable), even though the time series as a whole may be random. Consider, for example, the stream flow of the Neuse River. On the one hand, a hydrologist, working a year ago, would not have been able to predict whether today’s discharge is unusually high or low. It is just not possible to predict individual storms—the source of the river’s water—a year in advance; they are best considered random phenomena. On the other hand, if today’s discharge is high, the chances are excellent that tomorrow’s discharge will be high as well. Stream flow persists for a few days, because the rain water takes time to drain away.

The notion of short term correlation within the stream flow time series can also be described by a joint probability density function. If we denote the river’s discharge at time t_i as d_i, and discharge at time t_j as d_j, then we can speak of the joint probability density function p(d_i, d_j). In the case of stream flow, we expect that d_i and d_j will have a strong positive correlation when the time difference or lag, τ = t_i − t_j, is small (Figure 9.3A). When the measurements are more widely separated in time, then we expect the correlation to be weaker (Figure 9.3B). We expect discharge to be uncorrelated at separations of, say, a month or so (Figure 9.3C). On the other hand, discharge will again be positively correlated, although maybe only weakly so, at separations of about a year, because patterns of stream flow have an annual cycle. Note that we must assume that the time series is stationary, meaning that its statistical properties do not change with time, or else the degree of correlation would depend on the measurement times, as well as the time difference between them.

f09-03-9780128044889 — Figure 9.3 Scatter plots of the lagged Neuse River discharge. (A) Lag = 1 day, (B) 3 days, (C) 30 days. Note that the strength of the correlation decreases as lag is increased. MatLab script eda09_02.

We already have the methodology to quantify the degree of correlation of a joint probability density function: its covariance matrix, C_ij. In this case, we manipulate the formula to bring out the means, because in many cases we will be dealing with time series that fluctuate around zero:

$\begin{array}{l} C_{i j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} (d_{i} - \bar{d}) (d_{j} - \bar{d}) p (d_{i}, d_{j}) d d_{i} d d_{j} \\ = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} d_{i} d_{j} p (d_{i}, d_{j}) d d_{i} d d_{j} - 2 {\bar{d}}^{2} + {\bar{d}}^{2} = A_{i j} - {\bar{d}}^{2} \\ with A_{i j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} d_{i} d_{j} p (d_{i}, d_{j}) d d_{i} d d_{j} \end{array}$ $\begin{array}{l} C_{i j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} (d_{i} - \bar{d}) (d_{j} - \bar{d}) p (d_{i}, d_{j}) d d_{i} d d_{j} \\ = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} d_{i} d_{j} p (d_{i}, d_{j}) d d_{i} d d_{j} - 2 {\bar{d}}^{2} + {\bar{d}}^{2} = A_{i j} - {\bar{d}}^{2} \\ with A_{i j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} d_{i} d_{j} p (d_{i}, d_{j}) d d_{i} d d_{j} \end{array}$

si11_e (9.5)

Here, the mean, $\bar{d}$ $\bar{d}$ , of the time series is assumed to be independent of time (so it has no index). The matrix, A, is called the autocorrelation matrix of the time series. It is equal to the covariance matrix when the mean of the time series is zero.

Just as in the case of the covariance, the autocorrelation can be estimated from observations. The data are pairs of samples drawn from the time series, where one member of the pair is lagged by a fixed time interval, τ = (k − 1)Δt, with respect to the other (with k an integer; note that k = 1 corresponds to τ = 0). A time series of length N has N − |k − 1| such pairs. We then form a histogram of the pairs, as we did in the case of covariance, so that the integral in Equation (9.5) can be approximated by a summation:

$A_{i, j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} d_{i} d_{j} p (d_{i}, d_{j}) d d_{i} d d_{j} \approx \frac{1}{N - | k - 1 |} \sum_{s} d_{i}^{(s)} d_{j}^{(s)} N_{s} .$ $A_{i, j} = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} d_{i} d_{j} p (d_{i}, d_{j}) d d_{i} d d_{j} \approx \frac{1}{N - | k - 1 |} \sum_{s} d_{i}^{(s)} d_{j}^{(s)} N_{s} .$

si13_e (9.6)

Once again, we shrink the size of the bins so that at most one pair is in each bin and N_s equals either zero or unity, so summation over the bin is equal to summation over the data pairs themselves. For the k > 0 case, we have

$\begin{array}{l} A_{i, k + i - 1} \approx \frac{1}{N - | k - 1 |} \sum_{s} d_{i}^{(s)} d_{k + i - 1}^{(s)} N_{s} = \frac{1}{N - | k - 1 |} \sum_{i = 1}^{N - k + 1} d_{i} d_{k + i - 1} = \frac{a_{k}}{N - | k - 1 |} \\ with a_{k} = \sum_{i = 1}^{N - k + 1} d_{i} d_{k + i - 1} and k > 0 \end{array}$ $\begin{array}{l} A_{i, k + i - 1} \approx \frac{1}{N - | k - 1 |} \sum_{s} d_{i}^{(s)} d_{k + i - 1}^{(s)} N_{s} = \frac{1}{N - | k - 1 |} \sum_{i = 1}^{N - k + 1} d_{i} d_{k + i - 1} = \frac{a_{k}}{N - | k - 1 |} \\ with a_{k} = \sum_{i = 1}^{N - k + 1} d_{i} d_{k + i - 1} and k > 0 \end{array}$

si14_e (9.7)

The column vector, a, is called the autocorrelation of the time series. An element, a_k, is called the autocorrelation at time lag, τ = k − 1. The autocorrelation at negative lags equals the autocorrelation at positive lags, as A is a symmetric matrix, that is, A_ij = a_k, with k = |i − j| + 1. As we have defined it above, a_k is unnormalized, in the sense that it omits the factor of 1/(N − |k − 1|).

In MatLab, the autocorrelation is calculated as follows:

a = xcorr(d);

(MatLab Script eda09_03)

Here, d is a time series of length N. The xcorr() function returns a vector of length 2N − 1 that includes both negative and positive lags so that the zero lag element is a(N).

The autocorrelation of the Neuse River hydrograph is shown in Figure 9.4. For small lags, say of less than a month, the autocorrelation falls off rapidly with lag, with a time scale that reflects the time that rain water needs to drain away after a storm. For larger lags, say of a few years, the autocorrelation oscillates around zero with a period of one year. This behavior reflects the seasonal cycle. Summer and winter discharges are negatively correlated, as one tends to be high when the other is low.

f09-04-9780128044889 — Figure 9.4 Autocorrelation function of the Neuse River hydrograph. (A) Lags up to 1 month. Note that the autocorrelation decreases with lag. (B) Lags up to 10 years. Note that the autocorrelation oscillates with a period of 1 year, reflecting the seasonal cycle. The autocorrelation function has been adjusted for the decrease in overlap at the larger lags. MatLab script eda09_03.

9.2 Computing autocorrelation by hand

The autocorrelation at zero lag (k = 1) can be calculated by hand by writing down two copies of the time series, one above the other, multiplying adjacent terms, and adding:

$\begin{array}{c} \begin{array}{c} d_{1} & d_{2} & d_{3} & \dots & d_{N} \\ d_{1} & d_{2} & d_{3} & \dots & d_{N} \\ \times \\ d_{1}^{2} & d_{2}^{2} & d_{3}^{2} & d_{N}^{2} & d_{N}^{2} \end{array} \end{array} \overset{yields}{\to} a_{1} = d_{1}^{2} + d_{2}^{2} + d_{3}^{2} + \dots + d_{N}^{2}$ $\begin{array}{c} \begin{array}{c} d_{1} & d_{2} & d_{3} & \dots & d_{N} \\ d_{1} & d_{2} & d_{3} & \dots & d_{N} \\ \times \\ d_{1}^{2} & d_{2}^{2} & d_{3}^{2} & d_{N}^{2} & d_{N}^{2} \end{array} \end{array} \overset{yields}{\to} a_{1} = d_{1}^{2} + d_{2}^{2} + d_{3}^{2} + \dots + d_{N}^{2}$

si15_e (9.8)

Note that a₁ is proportional to the power in the time series. Subsequent elements of a_k are calculated by progressively offsetting one copy of the time series with respect to the other, prior to multiplying and adding (and ignoring the elements with no overlap). The lag Δt (k = 2) element is as follows:

$\begin{array}{c} \begin{array}{c} d_{1} & d_{2} & d_{3} & \dots & d_{N} \\ d_{1} & d_{2} & \dots & d_{N - 1} & d_{N} \\ \times \\ d_{2} d_{1} & d_{3} d_{2} & \dots & d_{N} d_{N - 1} \end{array} \end{array} \overset{yields}{\to} a_{2} = d_{2} d_{1} + d_{3} d_{2} + d_{4} d_{3} + \dots + d_{N} d_{N - 1}$ $\begin{array}{c} \begin{array}{c} d_{1} & d_{2} & d_{3} & \dots & d_{N} \\ d_{1} & d_{2} & \dots & d_{N - 1} & d_{N} \\ \times \\ d_{2} d_{1} & d_{3} d_{2} & \dots & d_{N} d_{N - 1} \end{array} \end{array} \overset{yields}{\to} a_{2} = d_{2} d_{1} + d_{3} d_{2} + d_{4} d_{3} + \dots + d_{N} d_{N - 1}$

si16_e (9.9)

and the lag 2Δt (k = 3) element is as follows:

$\begin{array}{c} \begin{array}{c} d_{1} & d_{2} & d_{3} & d_{4} & \dots & d_{N} \\ d_{1} & d_{2} & \dots & d_{N - 2} & d_{N - 1} & d_{N} \\ \times \\ d_{1} d_{3} & d_{2} d_{4} & \dots & d_{N - 2} d_{N} \end{array} \end{array} \overset{yields}{\to} a_{3} = d_{1} d_{3} + d_{2} d_{4} + \dots + d_{N - 2} d_{N}$ $\begin{array}{c} \begin{array}{c} d_{1} & d_{2} & d_{3} & d_{4} & \dots & d_{N} \\ d_{1} & d_{2} & \dots & d_{N - 2} & d_{N - 1} & d_{N} \\ \times \\ d_{1} d_{3} & d_{2} d_{4} & \dots & d_{N - 2} d_{N} \end{array} \end{array} \overset{yields}{\to} a_{3} = d_{1} d_{3} + d_{2} d_{4} + \dots + d_{N - 2} d_{N}$

si17_e (9.10)

9.3 Relationship to convolution and power spectral density

The formula for the autocorrelation is very similar to the formula for the convolution (Equation 7.1):

$\begin{array}{c} autocorrelation & convolution \\ a_{k} = \sum_{i} d_{i} d_{k + i - 1} & θ_{k} = \sum_{i} g_{i} h_{k - i + 1} \\ a (t) = \int_{- \infty}^{+ \infty} d (τ) d (t + τ) d τ & θ (t) = \int_{- \infty}^{+ \infty} g (τ) h (t - τ) d τ \\ a = d ⋆ d & θ = g * h \end{array}$ $\begin{array}{c} autocorrelation & convolution \\ a_{k} = \sum_{i} d_{i} d_{k + i - 1} & θ_{k} = \sum_{i} g_{i} h_{k - i + 1} \\ a (t) = \int_{- \infty}^{+ \infty} d (τ) d (t + τ) d τ & θ (t) = \int_{- \infty}^{+ \infty} g (τ) h (t - τ) d τ \\ a = d ⋆ d & θ = g * h \end{array}$

si18_e (9.11)

Note that a five pointed star, , is used to indicate autocorrelation, in the same sense that an asterisk, *, is used to indicate convolution. The two formulas are very similar, except that in the case of the convolution, one of the two time series is backward in time, in contrast to the autocorrelation, where both are forward in time. The relationship between the two can be found by transforming the autocorrelation integral to a new variable, τ′ = −τ,

$a (t) = d (t) ⋆ d (t) = \int_{- \infty}^{+ \infty} d (τ) d (t + τ) d τ = \int_{- \infty}^{+ \infty} d (- τ') d (t - τ) d τ' = d (- t) * d (t)$ $a (t) = d (t) ⋆ d (t) = \int_{- \infty}^{+ \infty} d (τ) d (t + τ) d τ = \int_{- \infty}^{+ \infty} d (- τ') d (t - τ) d τ' = d (- t) * d (t)$

si19_e (9.12)

Thus, the autocorrelation is the convolution of a time-reversed time series with the original time series.

Two neighboring points on a time series will correlate strongly with each other if the time series varies slowly between them. A time series with an autocorrelation that declines slowly with lag is necessarily richer in low frequency energy than one that declines quickly with lag. This relationship can be explored by computing the Fourier transform of the autocorrelation. The calculation is simplified by recalling that the Fourier transform of a convolution is the product of the transforms. Thus,

$\tilde{a} (ω) = ℱ \{d (- t)\} \tilde{d} (ω)$ $\tilde{a} (ω) = ℱ \{d (- t)\} \tilde{d} (ω)$

(9.13)

where $ℱ$ $ℱ$ {−d(t)} stands for the Fourier transform of d(−t). We compute it as follows:

$ℱ \{d (- t)\} = \int_{- \infty}^{+ \infty} d (- t) exp (i ω t) d t \int_{- \infty}^{+ \infty} d (t') exp (i (- ω) t') d t' = \tilde{d} (- ω) = {\tilde{d}}^{*} (ω)$ $ℱ \{d (- t)\} = \int_{- \infty}^{+ \infty} d (- t) exp (i ω t) d t \int_{- \infty}^{+ \infty} d (t') exp (i (- ω) t') d t' = \tilde{d} (- ω) = {\tilde{d}}^{*} (ω)$

si22_e (9.14)

Here, we have used the transformation of variables, t′ = −t, together with the fact that, for real time series, $\tilde{d} (ω)$ $\tilde{d} (ω)$ and $\tilde{d} (- ω)$ $\tilde{d} (- ω)$ are complex conjugates of each other. Thus,

$\tilde{a} (ω) = {\tilde{d}}^{*} (ω) \tilde{d} (ω) = {| \tilde{d} (ω) |}^{2}$ $\tilde{a} (ω) = {\tilde{d}}^{*} (ω) \tilde{d} (ω) = {| \tilde{d} (ω) |}^{2}$

(9.15)

The Fourier transform of the autocorrelation is proportional to the power spectral density of the time series. As we have seen in Section 6.5, functions that are broad in time have Fourier transforms that are narrow in frequency. Hence, a time series with a broad autocorrelation function has most of its power at low frequencies.

9.4 Cross-correlation

The underlying idea behind the autocorrelation is that pairs of samples drawn from the same time series, and separated by a fixed time lag, τ, are correlated. This idea can be generalized to pairs of samples drawn from two different time series. As an example, consider time series of precipitation, u, and stream flow, v. At times when precipitation is high, we expect stream flow to be high, too. However, the time of peak stream flow will be delayed with respect to the time of maximum precipitation, as water takes time to drain from the land. Thus, the precipitation and stream flow time series will be most correlated when the former is lagged by a specific amount of time with respect to the latter.

We quantify this idea by defining the probability density function, p(u_i, v_j), the joint probability for the i-th sample of time series, u, and the j-th sample of time series, v. The autocorrelation then generalizes to the cross-correlation, c_k (written side-by-size with the convolution, for comparison):

$\begin{array}{c} cross - correlation & convolution \\ c_{k} = \sum_{i} u_{i} v_{k + i - 1} & θ_{k} = \sum_{i} g_{i} h_{k - i + 1} \\ c (t) = \int_{- \infty}^{+ \infty} u (τ) v (t + τ) d τ & θ (t) = \int_{- \infty}^{+ \infty} g (τ) h (t - τ) d τ \\ c = u ⋆ v & θ = g * h \end{array}$ $\begin{array}{c} cross - correlation & convolution \\ c_{k} = \sum_{i} u_{i} v_{k + i - 1} & θ_{k} = \sum_{i} g_{i} h_{k - i + 1} \\ c (t) = \int_{- \infty}^{+ \infty} u (τ) v (t + τ) d τ & θ (t) = \int_{- \infty}^{+ \infty} g (τ) h (t - τ) d τ \\ c = u ⋆ v & θ = g * h \end{array}$

si26_e (9.16)

Note that the five pointed star is used to indicate cross-correlation, as well as autocorrelation, as the autocorrelation of a time series is its cross-correlation with itself. Here, u(t_i) and v(t_i) are two time series, each of length, N. The cross-correlation is related to the convolution by

$c (t) = u (t) ⋆ v (t) = u (- t) * v (t)$ $c (t) = u (t) ⋆ v (t) = u (- t) * v (t)$

(9.17)

In MatLab, the cross-correlation is calculated with the function

c = xcorr(u,v);

(MatLab Script eda09_04)

Here, u and v are time series of length, N. The xcorr() function returns both negative and positive lags and is of length, 2N−1. The zero-lag element is c(N). Unlike the autocorrelation, the cross-correlation is not symmetric in lag. Instead, the cross-correlation of v and u is the time-reversed version of the cross-correlation of u and v. Mistakes in ordering the arguments of the xcorr() function will lead to a result that is backwards in time; that is, if u(t) v(t) = c(t), then v(t) u(t) = c(−t).

We note here that the Fourier Transform of the cross-correlation is called the cross-spectral density:

$\tilde{c} (ω) = {\tilde{u}}^{*} (ω) \tilde{v} (ω)$ $\tilde{c} (ω) = {\tilde{u}}^{*} (ω) \tilde{v} (ω)$

(9.18)

However, we will put off discussion of its uses until Section 9.9.

9.5 Using the cross-correlation to align time series

The cross-correlation is useful in aligning two time series, one of which is delayed with respect to the other, as its peak occurs at the lag at which the two time series are best correlated, that is, the lag at which they best line up. In MatLab,

c = xcorr(u,v);
[cmax, icmax] = max(c);
tlag = −Dt * (icmax−N);

(MatLab eda09_04)

Here, Dt is the sampling interval of the time series and tlag is the time lag between the two time series. The lag is positive when features in v occur at later times than corresponding features in u. This technique is illustrated in Figure 9.5.

f09-05-9780128044889 — Figure 9.5 (A) Two time series, u(t) and v(t), with similar shapes but one shifted in time with respect to the other. (B) Time series aligned by lag determined through cross-correlation function. (C) Cross-correlation function. MatLab script eda09_04.

We apply this technique to an air quality dataset, in which the objective is to understand the diurnal fluctuations of ozone (O₃). Ozone is a highly reactive gas that occurs in small (parts per billion) concentrations in the earth’s atmosphere. Ozone in the stratosphere plays an important role in shielding the earth’s surface from ultraviolet (UV) light from the sun, for it is a strong UV absorber. But its presence in the troposphere at ground level is problematical. It is a major ingredient in smog and a health risk, increasing susceptibility to respiratory diseases. Tropospheric ozone has several sources, including chemical reactions between oxides of nitrogen and volatile organic compounds in the presence of sunlight and high temperatures. We thus focus on the relationship between ozone concentration and the intensity of sunlight (that is, of solar radiation). Bill Menke provides the following information about the dataset:

A colleague gave me a text file of ozone data from the Weather Center at the United States Military Academy at West Point, NY. It contains tropospheric (ground level) ozone data for 15 days starting on August 1, 1993. Also included in the file are solar radiation, air temperature and several other environmental parameters. The original file is named ozone_orig.txt and has about a dozen columns of data. I used it to create a file ozone_nohead.txt that contains just 4 columns of data, time in days after 00:00 08/01/1993, ozone in parts per billion, solar radiation in W/m², and air temperature in °C.

The solar radiation and ozone concentration data are shown in Figure 9.6. Both show a pronounced diurnal periodicity, but the peaks in ozone are delayed several hours behind the peaks in sunlight. The lag, determined by cross-correlating the two time series, is 3 h (Figure 9.7). Notice that excellent results are achieved, even though the two dataset do not exactly match.

f09-06-9780128044889 — Figure 9.6 (A) Hourly solar radiation data, in W/m², from West Point, NY, for 15 days starting August 1, 1993. (B) Hourly tropospheric ozone data, in parts per billion, from the same location and time period. Note the strong diurnal periodicity in both time series. Peaks in the ozone lag peaks in solar radiation (see vertical line). MatLab script eda09_05.

f09-07-9780128044889 — Figure 9.7 (A) Hourly solar radiation data, in W/m², from West Point, NY, for 5 days starting August 1, 1993. (B) Hourly tropospheric ozone data, in parts per billion, from the same location and time period. The solid curve is the original data. Note that it lags solar radiation. The dotted curve is ozone advanced by 3 h, an amount determined by cross-correlation. Note that only 5 of the 15 days of data are shown. (C) Cross-correlation function. MatLab script eda09_05.

9.6 Least squares estimation of filters

In Section 7.1, we showed that the convolution equation, g(t)*m(t) = d(t), can be written as a matrix equation of the form, Gm = d, where m and d are the time series versions of m(t) and d(t), respectively, and G is the matrix:

$G = [\begin{array}{c} g_{1} & 0 & 0 & \dots & 0 \\ g_{2} & g_{1} & 0 & \dots & 0 \\ g_{3} & g_{2} & g_{1} & \dots & 0 \\ \dots & \dots & \dots & \dots & 0 \\ g_{N} & g_{N - 1} & g_{N - 2} & \dots & g_{1} \end{array}]$ $G = [\begin{array}{c} g_{1} & 0 & 0 & \dots & 0 \\ g_{2} & g_{1} & 0 & \dots & 0 \\ g_{3} & g_{2} & g_{1} & \dots & 0 \\ \dots & \dots & \dots & \dots & 0 \\ g_{N} & g_{N - 1} & g_{N - 2} & \dots & g_{1} \end{array}]$

si29_e (9.19)

The least squares solution involves the matrix products, G^TG and G^Td:

$\begin{array}{l} G^{T} G = [\begin{array}{c} g_{1} & g_{2} & g_{3} & \dots & g_{N} \\ 0 & g_{1} & g_{2} & \dots & g_{N - 1} \\ 0 & 0 & g_{1} & \dots & g_{N - 2} \\ \dots & \dots & \dots & \dots & 0 \\ 0 & 0 & 0 & \dots & g_{1} \end{array}] [\begin{array}{c} g_{1} & 0 & 0 & \dots & 0 \\ g_{2} & g_{1} & 0 & \dots & 0 \\ g_{3} & g_{2} & g_{1} & \dots & 0 \\ \dots & \dots & \dots & \dots & 0 \\ g_{N} & g_{N - 1} & g_{N - 2} & \dots & g_{1} \end{array}] \approx [\begin{array}{c} a_{1} & a_{2} & a_{3} & \dots & a_{N} \\ a_{2} & a_{1} & a_{2} & \dots & \dots \\ a_{3} & a_{2} & a_{1} & \dots & \dots \\ \dots & \dots & \dots & \dots & \dots \\ a_{N} & \dots & \dots & \dots & a_{1} \end{array}] \propto A \\ G^{T} d = [\begin{array}{c} g_{1} & g_{2} & g_{3} & \dots & g_{N} \\ 0 & g_{1} & g_{2} & \dots & g_{N - 1} \\ 0 & 0 & g_{1} & \dots & g_{N - 2} \\ \dots & \dots & \dots & \dots & 0 \\ 0 & 0 & 0 & \dots & g_{1} \end{array}] [\begin{array}{c} d_{1} \\ d_{2} \\ d_{3} \\ \dots \\ d_{N} \end{array}] = [\begin{array}{c} c_{1} \\ c_{2} \\ c_{3} \\ \dots \\ c_{N} \end{array}] = c \end{array}$ $\begin{array}{l} G^{T} G = [\begin{array}{c} g_{1} & g_{2} & g_{3} & \dots & g_{N} \\ 0 & g_{1} & g_{2} & \dots & g_{N - 1} \\ 0 & 0 & g_{1} & \dots & g_{N - 2} \\ \dots & \dots & \dots & \dots & 0 \\ 0 & 0 & 0 & \dots & g_{1} \end{array}] [\begin{array}{c} g_{1} & 0 & 0 & \dots & 0 \\ g_{2} & g_{1} & 0 & \dots & 0 \\ g_{3} & g_{2} & g_{1} & \dots & 0 \\ \dots & \dots & \dots & \dots & 0 \\ g_{N} & g_{N - 1} & g_{N - 2} & \dots & g_{1} \end{array}] \approx [\begin{array}{c} a_{1} & a_{2} & a_{3} & \dots & a_{N} \\ a_{2} & a_{1} & a_{2} & \dots & \dots \\ a_{3} & a_{2} & a_{1} & \dots & \dots \\ \dots & \dots & \dots & \dots & \dots \\ a_{N} & \dots & \dots & \dots & a_{1} \end{array}] \propto A \\ G^{T} d = [\begin{array}{c} g_{1} & g_{2} & g_{3} & \dots & g_{N} \\ 0 & g_{1} & g_{2} & \dots & g_{N - 1} \\ 0 & 0 & g_{1} & \dots & g_{N - 2} \\ \dots & \dots & \dots & \dots & 0 \\ 0 & 0 & 0 & \dots & g_{1} \end{array}] [\begin{array}{c} d_{1} \\ d_{2} \\ d_{3} \\ \dots \\ d_{N} \end{array}] = [\begin{array}{c} c_{1} \\ c_{2} \\ c_{3} \\ \dots \\ c_{N} \end{array}] = c \end{array}$

si30_e (9.20)

Thus, the elements of G^Td are the cross-correlation, c, of the time series d and g and the elements of G^TG are approximately the autocorrelation matrix, A, of the time series, g. The matrix, G^TG, is approximately Toeplitz, with elements [G^TG]_ij = a_k, where k = |i − j| + 1. This result is only approximate, because on close examination, elements that appear to refer to the same autocorrelation are actually different from one another. Thus, for example, [G^TG]₁₁ is exactly a₁, but [G^TG]₂₂ is not, as it is the autocorrelation of the first N − 1 elements of g, not of all of g. The difference grows towards the bottom-right of the matrix.

This technique is sometimes used to solve the filter estimation problem, that is, solve θ = g * h for an estimate of h. We examined this problem previously in Section 7.3, using MatLab script eda07_03. We provide here an alternate version of this script. A major modification is made to the function called by the biconjugate gradient solver, bicg(). It now uses the autocorrelation to perform the multiplication F^TFv. The function was previously called filterfun() but is renamed here to autofun():

function y = autofun(v,transp_flag)
global a H;
N = length(v);
% FT F v = GT G v + HT H v
GTGv=zeros(N,1);
for i = [1:N]
 GTGv(i) = [fliplr(a(1:i)′), a(2:N−i+1)′] * v;
 end
Hv = H*v;
HTHv = H′*Hv;
y = GTGv + HTHv;
return

(MatLab autofun)

The global variable, a, contains the autocorrelation of g. It is computed only once, in the main script. The main script also performs the cross-correlation prior to the call to bicg():

clear a H;
global a H;
–––
al = xcorr(g);
Na = length(al);
a = al((Na+1)/2: Na);
–––
cl = xcorr(qobs2, g);
Nc = length(cl);
c = cl((Nc+1)/2: Nc);
–––
% set up F′f = GT qobs + HT h
% GT qobs is c=qobs2*g
HTh = H′* h;
FTf = c + HTh;
% solve
hest3 = bicg(@autofun, FTf, 1e−10, 3*L);

(MatLab eda09_06)

The results, shown in Figure 9.8, can be compared to those in Figure 7.7. The method does a good job recovering the two peaks in h, but suffers from “edge effects,” that is, putting spurious oscillations at the beginning and end of the time series.

f09-08-9780128044889 — Figure 9.8 (A) Synthetic temperature data, θ^obs(t), constructed from the true temperature plus the same level of random noise as in Figure 7.6. (B) True heat production, h^true(t). (C) Estimated heat production, h^est(t), calculated with generalized least squares using prior information of smoothness. Note edge effects. MatLab script eda09_06.

9.7 The effect of smoothing on time series

As was discussed in Section 4.5, the smoothing of data is a linear process of the form, d^smooth = Gd^obs. Smoothing is also a type of filtering, as can be seen by examining the form of data kernel, G (Equation 4.16), which is Toeplitz. The columns of G define a smoothing filter, s. Usually, we will want the smoothing to be symmetric, so that the smoothed data, d_i^smooth, is calculated through a weighted average of the observed data, d_j^obs, both to its left and right of i (where j > i corresponds to the future and j < i corresponds to the past). The filter, s_i, is, therefore, noncausal with coefficients that are symmetric about the present value (i = 1). The coefficients need to sum to unity, to preserve the overall amplitude of the data. These traits are exemplified in the three-point smoothing filter (see Equation 4.15):

$s = {[s_{0}, s_{1}, s_{2}]}^{T} = [¼, ½, ¼]^{T}$ $s = {[s_{0}, s_{1}, s_{2}]}^{T} = [¼, ½, ¼]^{T}$

(9.21)

It uses the present (element, i), the past (element, i − 1) and the future (element, i + 1) of d^obs to calculate d_i^smooth:

$\begin{array}{c} smoothed data = weighted average of observed data \\ or \\ d_{i}^{smooth} = ¼ d_{i - 1}^{obs} + ½ d_{i}^{obs} + ¼ d_{i + 1}^{obs} \end{array}$ $\begin{array}{c} smoothed data = weighted average of observed data \\ or \\ d_{i}^{smooth} = ¼ d_{i - 1}^{obs} + ½ d_{i}^{obs} + ¼ d_{i + 1}^{obs} \end{array}$

si32_e (9.22)

As long as the filter is of finite length, L, we can view the output as delayed with respect to the input, and the filtering operation itself to be causal:

$d_{i}^{smoothed and delayed} = ¼ d_{i}^{obs} + ½ d_{i - 1}^{obs} + ¼ d_{i - 2}^{obs}$ $d_{i}^{smoothed and delayed} = ¼ d_{i}^{obs} + ½ d_{i - 1}^{obs} + ¼ d_{i - 2}^{obs}$

(9.23)

In this case, the delay is one sample. In general, the delay is (L − 1)/2 samples. The length, L, controls the smoothness of the filter, with large Ls corresponding to large degrees of smoothing (Figure 9.9).

f09-09-9780128044889 — Figure 9.9 Smoothing of Neuse River hydrograph. (A) Observed data. (B) Observed data smoothed with symmetric three-point triangular filter. (C) Observed smoothed data with symmetric 21-point triangular filter. For clarity, only the first 500 days are plotted. MatLab script eda09_07.

The above filter is triangular in shape, as it ramps up linearly to its central value and then linearly ramps down. It weights the central datum more than its neighbors. This is in contrast to the uniform filter, which has L constant coefficients, each of amplitude, L⁻¹. It weights all L data equally. Many other shapes are possible, too. An important issue is the best shape for the smoothing filter, s.

One way of understanding the choice of the filter is to examine its effect on the autocorrelation function of the smoothed time series. Intuitively, we expect that smoothing broadens the autocorrelation, because it makes the time series vary less between samples. This behavior can be verified by computing the autocorrelation of the smoothed time series

${s (t) * d (t)} ⋆ {s (t) * d (t)} = s (- t) * d (- t) * s (t) * d (t) = {s (t) ⋆ s (t)} * {d (t) ⋆ d (t)}$ ${s (t) * d (t)} ⋆ {s (t) * d (t)} = s (- t) * d (- t) * s (t) * d (t) = {s (t) ⋆ s (t)} * {d (t) ⋆ d (t)}$

(9.24)

Thus, the autocorrelation of the smoothed time series is the autocorrelation of the original time series convolved with the autocorrelation of the smoothing filter. The autocorrelation function of the smoothing filter is a broad function. When convolved with the autocorrelation function of the data, it smoothes and broadens it. Filters of different shapes have autocorrelation functions with different degrees of broadness. Each results in the smoothed data having a somewhat differently shaped autocorrelation function.

Another way of understanding the effect of the filter is to examine its effect on the power spectral density of the smoothed time series. The idea behind smoothing is to suppress high frequency fluctuations in the data while leaving the low frequencies unchanged. One measure of the quality of a filter is the evenness by which the suppression occurs. From this perspective, filters that evenly damp out high frequencies are better than filters that suppress them unevenly.

The behavior of the filter can be understood via the convolution theorem (Section 6.11), which states that the Fourier transform of a convolution is the product of the transforms. Thus, the Fourier transform of the smoothed data is just

${\tilde{d}}^{smoothed} (ω) = \tilde{s} (ω) {\tilde{d}}^{obs} (ω)$ ${\tilde{d}}^{smoothed} (ω) = \tilde{s} (ω) {\tilde{d}}^{obs} (ω)$

(9.25)

That is, the transform of the smoothed data is the transform of the observed data multiplied by the transform of the filter. Thus, the effect of the filter can be understood by examining its amplitude spectral density, $| \tilde{s} (ω) |$ $| \tilde{s} (ω) |$ .

The uniform, or boxcar, filter with width, T, and amplitude, T⁻¹ is the easiest to analyze:

$\tilde{s} (ω) = \frac{1}{T} \int_{- T / 2}^{T / 2} exp (- i ω t) d t = \frac{2}{T} \int_{0}^{T / 2} cos (ω t) d t = \frac{2}{T} {\frac{sin (ω t)}{ω} |}_{0}^{T / 2} = sin c (\frac{ω T}{2 π})$ $\tilde{s} (ω) = \frac{1}{T} \int_{- T / 2}^{T / 2} exp (- i ω t) d t = \frac{2}{T} \int_{0}^{T / 2} cos (ω t) d t = \frac{2}{T} {\frac{sin (ω t)}{ω} |}_{0}^{T / 2} = sin c (\frac{ω T}{2 π})$

si37_e (9.26)

Here, we have used the rule, exp(−iωt) = cos(ωt) + isin(ωt) and the definition, sinc(x) = sin(πx)/(πx). The cosine function is symmetric about the origin, so its integral on the (−½T, +½T) interval is twice that on (0, +½T) interval. The sine function is anti-symmetric, so its integral on the (−½T, 0) interval cancels its integral on the (0, +½T) interval. While the sinc function (Figure 9.10) declines with frequency, it does so unevenly, with many sidelobes along the frequency axis. It does not smoothly damp out high frequencies and so is a poor filter, from this perspective.

f09-10-9780128044889 — Figure 9.10 Amplitude spectral density of uniform smoothing filters (A) Filter of length, L = 3. (B) Filter of length, L = 21. MatLab script eda09_08.

A filter based on a Normal curve will have no sidelobes (Figure 9.11), as the Fourier transform of a Normal curve with variance, σ_t², in time is a Normal curve with variance, σ_ω² = σ_t⁻², in frequency (Equation 6.27). It is a better filter, from the perspective of smoothly and evenly damping high frequencies. However, a Normal filter is infinite in length and must, in practice, be truncated, a process which introduces small sidelobes. Note that the effective width of a filter depends not only on its length, L, but also on its shape. The quantity, 2σ_t, is a good measure of its effective width, where σ_t² is its variance in time. Thus, for example, a Normal filter with σ_t = 6.05 samples has approximately the same effective width as a uniform filter with L = 21, which has a variance of about 6² (compare Figures 9.10 and 9.11).

f09-11-9780128044889 — Figure 9.11 Amplitude spectral density of Normal smoothing filters. (A) Filter with variance equal to that of a uniform filter with, length, L = 3. (B) Filter with variance equal to that of a uniform filter with length, L = 21. MatLab script eda09_09.

9.8 Band-pass filters

A smoothing filter passes low frequencies and attenuates high frequencies. A natural extension of this idea is a filter that passes frequencies in a specified range, or pass-band, and that attenuates frequencies outside of this range. A filter that passes low frequencies is called a low-pass filter, high frequencies, a high-pass filter, and an intermediate band, a band-pass filter. A filter that passes all frequencies except a given range is called a notch filter.

In order to design such filters, we need to know how to assess the effect of a given set of filter coefficients on the power spectral density of the filter. We start with the definition of an Infinite Impulse Response (IIR) filter (Equation 7.21), f = v^inv*u, where u and v are short filters of lengths, N_u and N_v, respectively, and v^inv is the inverse filter of v. The z-transform of the filter, f, is

$f = v^{inv} * u \to f (z) = \frac{u (z)}{v (z)} = c \frac{\prod_{j = 1}^{N_{u} - 1} (z - z_{j}^{u})}{\prod_{k = 1}^{N_{v} - 1} (z - z_{k}^{v})}$ $f = v^{inv} * u \to f (z) = \frac{u (z)}{v (z)} = c \frac{\prod_{j = 1}^{N_{u} - 1} (z - z_{j}^{u})}{\prod_{k = 1}^{N_{v} - 1} (z - z_{k}^{v})}$

si38_e (9.27)

Here, z _j^u and z_k^v are the roots of u(z) and v(z), respectively and c is a normalization constant. As our goal involves spectral properties, we need to understand the connection between the z-transform and the Fourier transform. The Discrete Fourier Transform is defined as

${\tilde{f}}_{k} = \sum_{n = 1}^{N} f_{k} exp (- i ω_{k} t_{n}) = \sum_{n = 1}^{N} f_{k} exp (- i (k - 1) Δ ω (n - 1) Δ t)$ ${\tilde{f}}_{k} = \sum_{n = 1}^{N} f_{k} exp (- i ω_{k} t_{n}) = \sum_{n = 1}^{N} f_{k} exp (- i (k - 1) Δ ω (n - 1) Δ t)$

si39_e (9.28)

as ω_k=(k − 1)Δω and t_n=(n − 1)Δt. Note that the factor of (n − 1) within the exponential can be interpreted as raising the exponential to the (n − 1) power. Thus,

${\tilde{f}}_{k} = \sum_{n = 1}^{N} f_{k} z^{n - 1} with z = exp (- i (k - 1) Δ ω Δ t) = exp (- \frac{2 π i (k - 1)}{N})$ ${\tilde{f}}_{k} = \sum_{n = 1}^{N} f_{k} z^{n - 1} with z = exp (- i (k - 1) Δ ω Δ t) = exp (- \frac{2 π i (k - 1)}{N})$

si40_e (9.29)

Here, we have used the relationship, ΔωΔt = 2π/N. Thus, the Fourier transform is just the z-transform evaluated at a specific set of z's. There are N of these z's and they are equally spaced on the unit circle (that is, the circle |z|² = 1 in the complex z-plane, Figure 9.12). A point on the unit circle can be represented as, z = exp(−iθ), where θ is angle with respect to the real axis. Frequency, ω, is proportional to angle, θ, via θ = ωΔt = (k − 1)ΔωΔt = 2π(k − 1)/N. As the points in a Fourier transform are evenly spaced in frequency, they are evenly spaced in angle around the unit circle. Zero frequency corresponds to θ = 0 and the Nyquist frequency corresponds to θ = π; that is, 180°).

f09-12-9780128044889 — Figure 9.12 Complex z-plane. showing the unit circle, |z|²= 1. A point (+ sign) on the unit circle makes an angle, θ, with respect to the positive z-axis. It corresponds to a frequency, ω = θ/Δt, in the Fourier transform.

Now we are in a position to analyze the effect of the filters, u and v on the spectrum of the composite filter, f = v^inv*u. The polynomial, u(z), has N_u − 1 roots (or “zeros”), each of which creates a region of low amplitude in a patch of the z-plane near that zero. If the unit circle intersects this patch, then frequencies on that segment of the unit circle are attenuated. Thus, for example, zeros near θ = 0 attenuate low frequencies (Figure 9.13A) and zeros near θ = π (the Nyquist frequency) attenuate high frequencies (Figure 9.13B).

f09-13-9780128044889 — Figure 9.13 (A) Complex z-plane representation of the high-pass filter, u = [1, −1.1]^T along with power spectral density of the filter. (B) Corresponding plots for the low-pass filter, u = [1, 1.1]^T. Origin (circle), Fourier transform points on the unit circle (black +), and zero (white *) are shown. MatLab script eda09_10 and eda09_11.

The polynomial, v(z), has N_u − 1 roots, so that its reciprocal, 1/v(z), has N_u − 1 singularities (or poles), each of which creates a region of high amplitude in a patch of the z-plane near that pole. If the unit circle intersects this patch, then frequencies on that segment of the unit circle are amplified (Figure 9.14A). Thus, for example, poles near θ = 0 amplify low frequencies and zeros near θ = π (the Nyquist frequency) amplify high frequencies. As was discussed in Section 7.6, the poles must lie outside the unit circle for the inverse filter, v^inv, to exist. In order for the filter to be real, the poles and zeros either must be on the real z-axis or occur in complex-conjugate pairs (that is, at angles, θ and −θ).

f09-14-9780128044889 — Figure 9.14 (A) Complex z-plane representation of a band-pass filter with u = [1, 0.60 + 0.66i]^T * [1, 0.60 − 0.66i]^T along with the power spectral density of the filter. (B) Corresponding plots for the notch filter, u = [1, 0.9i]^T * [1, − 0.9i]^T and v = [1, 0.8i]^T * [1, − 0.8i]^T. Origin (circle), Fourier transform points on the unit circle (black +), zeros (white *), and poles (white +) are shown. MatLab script eda09_12 and eda09_13.

Filter design then becomes a problem of cleverly placing poles and zeros in the complex z-plane to achieve whatever attenuation or amplification of frequencies is desired. Often, just a few poles and zeros are needed to achieve the desired effect. For instance, two poles nearly collocated with two zeros suffice to create a notch filter (Figure 9.14B), that is, one that attenuates just a narrow range of frequencies. With just a handful of poles and zeros—corresponding to filters u and v with just a handful of coefficients—one can create extremely effective and efficient filters.

As an example, we provide a MatLab function for a Chebyshev band-pass filter, chebyshevfilt.m. It passes frequencies in a specific frequency interval and attenuates frequencies outside that interval. It uses u and v each of length 5, corresponding to four zeros and four poles. The zeros are paired up, two at θ = 0 and two at θ = π, so that frequencies near zero and near the Nyquist frequency are strongly attenuated. The two conjugate pairs of poles are near θs corresponding to the ends of the pass-band interval (Figure 9.15). The function is called as follows:

f09-15-9780128044889 — Figure 9.15 (A) Complex z-plane representation of a Chebychev band-pass filter. The origin (small circle), unit circle (large circle), zeros (*), and poles (+) are shown. MatLab script eda09_14.

[dout, u, v] = chebyshevfilt(din, Dt, flow, fhigh);

(MatLab eda09_14)

Here, din is the input time series, Dt is the sampling interval and flow, and fhigh the pass-band. The function returns the filtered time series, dout, along with the filters, u and v. The input response of the filter (that is, its influence on a spike) is illustrated in Figure 9.16.

f09-16-9780128044889 — Figure 9.16 Impulse response and spectrum of a Chevyshev band-pass filter, for a 5-10 Hz pass-band. MatLab script eda09_14.

9.9 Frequency-dependent coherence

Time series that track one another, that is, exhibit coherence, need not do so at every period. Consider, for instance, a geographic location where air temperature and wind speed both have annual cycles. Summer days are, on average, both hotter and windier than winter days. But this correlation, which is due to large scale processes in the climate system, does not hold for shorter periods of a few days. A summer heat wave is not, on average, any windier than in times of moderate summer weather. In this case, temperature and wind are correlated at long periods, but not at short ones. In another example, plant growth in a given biome might correlate with precipitation over periods of a few weeks, but this does not necessarily imply that plant growth is faster in winter than in summer, even when winter tends to be wetter, on average, than summer. In this case, growth and precipitation are correlated at short periods, but not at long ones.

We introduce here a new dataset that illustrates this behavior, water quality data from the Reynolds Channel, part of the Middle Bay estuary on the south shore of Long Island, NY. Bill Menke, who provided the data, says the following about it:

I downloaded this Reynolds Channel Water Quality dataset from the US Geological Survey’s National Water Information System. It consists of daily average values of a variety of environmental parameters for a period of about five years, starting on January 1, 2006. The original data was in one long text file, but I broke it into two pieces, the header (reynolds_header.txt) and the data (reynolds_data.txt). The data file has very many columns, and has time in a year-month-day format. In order to make the data more manageable, I created another file, reynolds_uninterpolated.txt, that has time reformatted into days starting on January 1, 2006 and that retains only six of the original data columns: precipitation in inches, air temperature in °C, water temperature in°C, salinity in practical salinity units, turbidity in formazin nephelometric units and chlorophyll in micrograms per liter. Not every parameter had a data value for every time, so I set the missing values to the placeholder, −999. Finally I created a file, reynolds_interpolated.txt, in which missing data are filled in using linear interpolation. The MatLab script that I used is called interpolate_reynolds.m.

Note that the original data had missing data that were filled in using interpolation. We will discuss this process in the next chapter. A plot of the data (Figure 9.17) reveals that the general appearance of the different data types is quite variable. Precipitation is very spiky, reflecting individual storms. Air and water temperature, and to a lesser degree, salinity, are dominated by the annual cycle. Moreover, turbidity (cloudiness of the water) and chlorophyll (a proxy for the concentration of algae and other photosynthetic plankton) have both long period oscillations and short intense spikes.

f09-17-9780128044889 — Figure 9.17 Daily water quality measurements from Reynolds Channel (New York) for several years starting January 1, 2006. Six environmental parameters are shown: (A) precipitation in inches; (B) air temperature in °C; (C) water temperature in °C; (D) salinity in practical salinity units; (E) turbidity; and (F) chlorophyll in micrograms per liter. While these data have been linearly interpolated to fill in gaps, an alternative (and maybe better) strategy would be to leave the gaps and compare only portions of pairs of timeseries with no gaps. MatLab script eda09_15.

We can look for correlations at different periods by band-pass filtering the data using different pass bands, for example periods of about 1 year and periods of about 5 days (Figure 9.18). All six time series appear to have some coherence at periods of 1 year, with air and water temperature tracking each other the best and turbidity tracking nothing very well. The situation at periods of about 5 days is more complicated. The most coherent pair seems to be salinity and precipitation, which are anti-correlated (as one might expect, as rain dilutes the salt in the bay). Air and water temperature do not track each other nearly as well in this period band than at periods of 1 year, but they do seem to show some coherence. Chlorophyll does not seem correlated with any of the other parameters at these shorter periods.

f09-18-9780128044889 — Figure 9.18 Band-pass filtered water quality measurements from Reynolds Channel (New York) for several years starting January 1, 2006. (A) Periods near 1 year; and (B) periods near 5 days. MatLab script eda09_16.

Our goal is to quantify the degree of similarity between two time series, u(t) and v(t), at frequencies near a specified frequency, ω₀. We start by band-pass filtering the time series to produce filtered versions, f(t) * u(t) and f(t) * v(t). The band-pass filter, f(t, ω₀, Δω), is chosen to have a center frequency, ω₀, and a bandwidth, 2Δω (meaning that it passes frequencies in the range ω₀ ± Δω). We now compare these two filtered time series by cross-correlating them:

$\begin{array}{c} c (t, ω_{0}, Δ ω) = {f (t, ω_{0}, Δ ω) * u (t)} ⋆ {f (t, ω_{0}, Δ ω) * v (t)} \\ = f (- t, ω_{0}, Δ ω) * f (t, ω_{0}, Δ ω) * u (- t) * v (t) \end{array}$ $\begin{array}{c} c (t, ω_{0}, Δ ω) = {f (t, ω_{0}, Δ ω) * u (t)} ⋆ {f (t, ω_{0}, Δ ω) * v (t)} \\ = f (- t, ω_{0}, Δ ω) * f (t, ω_{0}, Δ ω) * u (- t) * v (t) \end{array}$

si41_e (9.30)

If the two time series are similar in shape (and if they are aligned in time), then the zero-lag value of the cross-correlation, $c (t = 0, ω_{0}, Δ ω)$ $c (t = 0, ω_{0}, Δ ω)$ will have a large absolute value. Its value will be large and positive when the time series are nearly the same, and large and negative if they have nearly the same shape but are flipped in sign with respect to each other. It will be near-zero when the two time series are dissimilar.

Two undesirable aspects of Equation (9.30) are that a different band-pass filtered version of the time series is required for every frequency at which we want to evaluate similarity and the whole cross-correlation is calculated, whereas only its zero-lag value is needed. As we show below, these time-consuming calculations are unnecessary. We can substantially improve on Equation (9.30) by utilizing the fact that the value of a function, c(t), at time, t = 0, is proportional to the integral of its Fourier transform over frequency:

$c (t = 0) = \frac{1}{2 π} \int_{- \infty}^{+ \infty} \tilde{c} (ω) exp (0) d ω = \frac{1}{2 π} \int_{- \infty}^{+ \infty} \tilde{c} (ω) d ω$ $c (t = 0) = \frac{1}{2 π} \int_{- \infty}^{+ \infty} \tilde{c} (ω) exp (0) d ω = \frac{1}{2 π} \int_{- \infty}^{+ \infty} \tilde{c} (ω) d ω$

si43_e (9.31)

Applying this relationship to the cross-correlation at zero lag, c(t = 0), and using the rule that the Fourier transform of a convolution is the product of the transforms, yields

$\begin{array}{c} c (t = 0, ω_{0}, Δ ω) = \frac{1}{2 π} \int_{- \infty}^{+ \infty} {\tilde{f}}^{*} (ω, ω_{0}, Δ ω) \tilde{f} (ω, ω_{0}, Δ ω) {\tilde{u}}^{*} (ω) \tilde{v} (ω) d ω \\ \approx \frac{1}{2 π} \int_{- ω_{0} - Δ ω}^{- ω_{0} + Δ ω} {\tilde{u}}^{*} (ω) \tilde{v} (ω) d ω + \frac{1}{2 π} \int_{+ ω_{0} - Δ ω}^{+ ω_{0} + Δ ω} {\tilde{u}}^{*} (ω) \tilde{v} (ω) d ω \\ = \frac{1}{π} \int_{ω_{0} - Δ ω}^{ω_{0} + Δ ω} Re {{\tilde{u}}^{*} (ω) \tilde{v} (ω)} d ω = \frac{2 Δ ω}{π} \overset{―}{Re \{{\tilde{u}}^{*} (ω_{0}) \tilde{v} (ω_{0})\}} \end{array}$ $\begin{array}{c} c (t = 0, ω_{0}, Δ ω) = \frac{1}{2 π} \int_{- \infty}^{+ \infty} {\tilde{f}}^{*} (ω, ω_{0}, Δ ω) \tilde{f} (ω, ω_{0}, Δ ω) {\tilde{u}}^{*} (ω) \tilde{v} (ω) d ω \\ \approx \frac{1}{2 π} \int_{- ω_{0} - Δ ω}^{- ω_{0} + Δ ω} {\tilde{u}}^{*} (ω) \tilde{v} (ω) d ω + \frac{1}{2 π} \int_{+ ω_{0} - Δ ω}^{+ ω_{0} + Δ ω} {\tilde{u}}^{*} (ω) \tilde{v} (ω) d ω \\ = \frac{1}{π} \int_{ω_{0} - Δ ω}^{ω_{0} + Δ ω} Re {{\tilde{u}}^{*} (ω) \tilde{v} (ω)} d ω = \frac{2 Δ ω}{π} \overset{―}{Re \{{\tilde{u}}^{*} (ω_{0}) \tilde{v} (ω_{0})\}} \end{array}$

si44_e (9.32)

Note that this formula involves the cross-spectral density, ${\tilde{u}}^{*} (ω) \tilde{v} (ω)$ ${\tilde{u}}^{*} (ω) \tilde{v} (ω)$ . Here, we assume that the band-pass filter can be approximated by two boxcar functions, one centered at +ω₀ and the other at −ω_0, so the integration limits, ±∞, can be replaced with integration over the positive and negative pass-bands. The cross-correlation is a real function, so the real part of its Fourier transform is symmetric in frequency and the imaginary part is anti-symmetric. Thus, only the real part of the integrand contributes. Except for a scaling factor of 1/(2Δω), the integral is just the average value of the integrand within the pass band, so we replace it with the average, defined as

$\overset{―}{\tilde{z} (ω_{0})} = \frac{1}{2 Δ ω} \int_{ω_{0} - Δ ω}^{ω_{0} + Δ ω} \tilde{z} (ω) d ω$ $\overset{―}{\tilde{z} (ω_{0})} = \frac{1}{2 Δ ω} \int_{ω_{0} - Δ ω}^{ω_{0} + Δ ω} \tilde{z} (ω) d ω$

si46_e (9.33)

The zero-lag cross-correlation can be normalized into a quantity that varies between ±1 by dividing each time series by the square root of its power. Power is just the autocorrelation, a(t), of the time series at zero lag, and the autocorrelation is just the cross-correlation of a time series with itself, so power satisfies an equation similar to the one above:

$\begin{array}{c} P_{u} = a_{u} (t = 0, ω_{0}, Δ ω) = \frac{2 Δ ω}{π} \overset{―}{{\tilde{u}}^{*} (ω_{0}) \tilde{u} (ω_{0})} = \frac{2 Δ ω}{π} \overset{―}{| \tilde{u} (ω_{0}) |^{2}} \\ P_{v} = a_{v} (t = 0, ω_{0}, Δ ω) = \frac{2 Δ ω}{π} \overset{―}{{\tilde{v}}^{*} (ω_{0}) \tilde{v} (ω_{0})} = \frac{2 Δ ω}{π} \overset{―}{| \tilde{v} (ω_{0}) |^{2}} \end{array}$ $\begin{array}{c} P_{u} = a_{u} (t = 0, ω_{0}, Δ ω) = \frac{2 Δ ω}{π} \overset{―}{{\tilde{u}}^{*} (ω_{0}) \tilde{u} (ω_{0})} = \frac{2 Δ ω}{π} \overset{―}{| \tilde{u} (ω_{0}) |^{2}} \\ P_{v} = a_{v} (t = 0, ω_{0}, Δ ω) = \frac{2 Δ ω}{π} \overset{―}{{\tilde{v}}^{*} (ω_{0}) \tilde{v} (ω_{0})} = \frac{2 Δ ω}{π} \overset{―}{| \tilde{v} (ω_{0}) |^{2}} \end{array}$

si47_e (9.34)

Here, P_u and P_v, are the power in the band-passed versions of u(t) and v(t), respectively. Note that we can omit taking the real parts, for they are purely real. The quantity

$C = \frac{c (t = 0, ω_{0}, Δ ω)}{P_{u}^{½} P_{v}^{½}} = \frac{\overset{―}{Re \{{\tilde{u}}^{*} (ω_{0}) \tilde{v} (ω_{0})\}}}{{\{\overset{―}{| \tilde{u} (ω_{0}) |^{2}} \overset{―}{| \tilde{v} (ω_{0}) |^{2}}\}}^{½}}$ $C = \frac{c (t = 0, ω_{0}, Δ ω)}{P_{u}^{½} P_{v}^{½}} = \frac{\overset{―}{Re \{{\tilde{u}}^{*} (ω_{0}) \tilde{v} (ω_{0})\}}}{{\{\overset{―}{| \tilde{u} (ω_{0}) |^{2}} \overset{―}{| \tilde{v} (ω_{0}) |^{2}}\}}^{½}}$

si48_e (9.35)

which varies between +1 and –1, is a measure of the degree of similarity of the time series, u(t) and v(t). However, the quantity

$C_{u v}^{2} (ω_{0}, Δ ω) = \frac{{| \overset{―}{{\tilde{u}}^{*} (ω_{0}) \tilde{v} (ω_{0})} |}^{2}}{\overset{―}{| \tilde{u} (ω_{0}) |^{2}} \overset{―}{| \tilde{v} (ω_{0}) |^{2}}}$ $C_{u v}^{2} (ω_{0}, Δ ω) = \frac{{| \overset{―}{{\tilde{u}}^{*} (ω_{0}) \tilde{v} (ω_{0})} |}^{2}}{\overset{―}{| \tilde{u} (ω_{0}) |^{2}} \overset{―}{| \tilde{v} (ω_{0}) |^{2}}}$

si49_e (9.36)

is more commonly encountered in the literature. It is called the coherence of time series u(t) and v(t). It is nearly the square of $C$ $C$ , except that it omits the taking of the real part, so that it does not have exactly the interpretation of the normalized zero-lag cross-correlation of the band-passed time series. It does, however, behave similarly (see Note 9.1). It varies between zero and unity, being small when the time series are very dissimilar and large when they are nearly identical. These formulas are demonstrated in MatLab script eda09_17.

The averaging in Equation (9.36)is over neighboring frequencies in a densely-sampled frequency series that is the result of taking the Fourier transform of long time series. However, a very similar result can be obtained by subdividing the long time series into shorter segments, taking the Fourier transform of each, and averaging the results. This correspondence follows from the frequency-spacing of the Fourier transform depending upon on the length of the time series. When a long time series is subdivided into K shorter segments of equal length, the number of frequencies decreases by a factor of K and the number of estimates at a given frequency increases by the same factor. Averaging the K estimates, all for the same frequency, from the short time series gives a result similar to averaging K adjacent values from the long one. This result, due to Welch (1967) is the basis for the MatLab coherence function mscohere() (see MatLab script eda09_18 for an example).

We return now to the Reynolds Channel water quality dataset, and compute the coherence of each pair of time series (several of which are shown in Figure 9.19). Air temperature and water temperature are the most highly coherent time series. They are coherent both at low frequencies (periods of a year or more) and high frequencies (periods of a few days). Precipitation and salinity are also coherent over most of frequency range, although less strongly than air and water temperature. Chlorophyll correlates with the other time series only at the longest periods, indicating that, while it is sensitive to the seasonal cycle, it is not sensitive to short time scale fluctuations in these parameters.

f09-19-9780128044889 — Figure 9.19 Coherence of water quality measurements from Reynolds Channel (New York). (A) Air temperature and water temperature; (B) precipitation and salinity; and (C) water temperature and chlorophyll. MatLab script eda09_18.

9.10 Windowing before computing Fourier transforms

When computing the power spectral density of continuous time series, we are faced with a decision of how long a segment of the time series to use. Longer is better, of course, both because a long segment is more likely to have properties representative of the time series as a whole, and because long segments provide greater resolution (recall that frequency sampling, Δω, scales with N⁻¹). Actually, as data are often scarce, more often the question is how to make do with a short segment.

A short segment of a time series can be created by multiplying an indefinitely long time series, d(t), by a window function, W(t); that is, a function that is zero everywhere outside the segment. The simplest window function is the boxcar function, which is unity within the interval and zero outside it. The key question is what effect windowing has on the Fourier transform of a time series; that is, how the Fourier transform of W(t)d(t) differs from the Fourier transform of d(t). This question can be analyzed using the convolution theorem. As discussed in Section 6.11, the convolution of two time series has a Fourier transform that is the product of the two individual Fourier transforms. But time and frequency play symmetric roles in the Fourier transform. Thus, the product of two time series has a Fourier transform that is the convolution of the two individual transforms. Windowing has the effect of convolving the Fourier transform of the time series with the Fourier transform of the window function.

From this perspective, a window function with a spiky Fourier transform is the best, because convolving a function with a spike leaves the function unchanged. As we have seen in Section 9.7, the Fourier transform of a boxcar is a sinc function. It has a central spike, which is good, but it also has sidelobes, which are bad. The sidelobes create peaks in the spectrum of the windowed time series, W(t)d(t), that are not present in the spectrum of the original time series, d(t) (Figure 9.20). These artifacts can easily be mistaken for real periodicities in the data.

f09-20-9780128044889 — Figure 9.20 Effect of windowing a sinusoidal time series, d(t) = cos(ω₀t), with a boxcar window function, W(t), prior to computing its amplitude spectral density. (A) Time series, d(t). (B) Boxcar windowing function, W(t). (C) Product, d(t)W(t). (D-F) Amplitude spectral density of d(t), W(t) and d(t)W(t). MatLab script eda09_19.

The solution is a better window function, one that does not have a Fourier transform with such strong sidelobes. It must be zero outside the interval, but we have complete flexibility in choosing its shape within the interval. Many such functions (or tapers) have been proposed. A popular one is the Hamming window function (or Hamming taper)

$W (t_{k}) = 0.54 - 0.46 cos (\frac{2 π (k - 1)}{N_{w} - 1})$ $W (t_{k}) = 0.54 - 0.46 cos (\frac{2 π (k - 1)}{N_{w} - 1})$

si51_e (9.37)

where N_w is the length of the window. Its Fourier transform (Figure 9.21) has significantly lower-amplitude sidelobes than the boxcar window function. Its central spike is wider, however (compare Figure 9.20E with Figure 9.21E), implying that it smoothes the spectrum of d(t) more than does a boxcar. Smoothing is bad in this context, because it blurs features in the spectrum that might be important. Two narrow and closely-spaced spectral peaks, for instance, will appear as a single broad peak. Unfortunately, the width of the central peak and the amplitude of sidelobes trade off in window functions. The end result is always a compromise between the two.

f09-21-9780128044889 — Figure 9.21 Effect of windowing a sinusoidal time series, d(t) = cos(ω₀t), with a Hamming window function, W(t), prior to computing its amplitude spectral density. (A) Time series, d(t). (B) Hamming windowing function, W(t). (C) Product, d(t)W(t). (D-F) Amplitude spectral density of d(t), W(t), and d(t)W(t). MatLab script eda09_20.

Notwithstanding this fact, one can nevertheless do substantially better than the Hamming taper, as we will see in the next section.

9.11 Optimal window functions

A good window function is one that has a spiky power spectral density. It should have large amplitudes in a narrow range of frequencies, say ±ω₀, straddling the origin and have small amplitudes at higher frequencies. One way to quantify spikiness is through the ratio

$R = \frac{\int_{- ω_{0}}^{+ ω_{0}} {| \tilde{W} (ω) |}^{2} d ω}{\int_{- ω_{n y}}^{+ ω_{n y}} {| \tilde{W} (ω) |}^{2} d ω}$ $R = \frac{\int_{- ω_{0}}^{+ ω_{0}} {| \tilde{W} (ω) |}^{2} d ω}{\int_{- ω_{n y}}^{+ ω_{n y}} {| \tilde{W} (ω) |}^{2} d ω}$

si52_e (9.38)

Here, $\tilde{W} (ω)$ $\tilde{W} (ω)$ is the Fourier transform of the window function and ω_ny is the Nyquist frequency. From this point of view, the best window function is the one that maximizes the ratio, R.

The denominator of Equation (9.38) is proportional to the power in the window function (see Equation 6.41). If we restrict ourselves to window functions that all have unit power, then the maximization becomes as follows:

$maximize F = \int_{- ω_{0}}^{+ ω_{0}} {| \tilde{W} (ω) |}^{2} d ω with the constraint \int {| W (t) |}^{2} d t = 1$ $maximize F = \int_{- ω_{0}}^{+ ω_{0}} {| \tilde{W} (ω) |}^{2} d ω with the constraint \int {| W (t) |}^{2} d t = 1$

si54_e (9.39)

The discrete Fourier transform, $\tilde{W} (ω)$ $\tilde{W} (ω)$ , of the window function and its complex conjugate, ${\tilde{W}}^{*} (ω)$ ${\tilde{W}}^{*} (ω)$ , are

$\tilde{W} (ω) = \sum_{n = 1}^{N} w_{n} exp (- i (n - 1) ω Δ t) and {\tilde{W}}^{*} (ω) = \sum_{m = 1}^{N} w_{m} exp (+ i (m - 1) ω Δ t)$ $\tilde{W} (ω) = \sum_{n = 1}^{N} w_{n} exp (- i (n - 1) ω Δ t) and {\tilde{W}}^{*} (ω) = \sum_{m = 1}^{N} w_{m} exp (+ i (m - 1) ω Δ t)$

si57_e (9.40)

Inserting ${| \tilde{W} (ω) |}^{2} = {\tilde{W}}^{*} (ω) \tilde{W} (ω)$ ${| \tilde{W} (ω) |}^{2} = {\tilde{W}}^{*} (ω) \tilde{W} (ω)$ into F in Equation (9.39) yields

$F = \sum_{m = 1}^{N} \sum_{n = 1}^{N} w_{n} w_{m} M_{n m} with M_{n m} = \int_{- ω_{0}}^{+ ω_{0}} exp (i (m - n) ω Δ t) d ω$ $F = \sum_{m = 1}^{N} \sum_{n = 1}^{N} w_{n} w_{m} M_{n m} with M_{n m} = \int_{- ω_{0}}^{+ ω_{0}} exp (i (m - n) ω Δ t) d ω$

si59_e (9.41)

The integration can be performed analytically:

$M_{n m} = \int_{- ω_{0}}^{+ ω_{0}} exp (i(m - n) ω Δ t) d ω = 2 \int_{0}^{+ ω_{0}} cos ((m - n) ω Δ t) d ω = \frac{2 sin((m - n) ω_{0} Δ t)}{(m - n) Δ t} = 2 ω_{0} sin c ((m - n) ω_{0} Δ t / π)$ $M_{n m} = \int_{- ω_{0}}^{+ ω_{0}} exp (i(m - n) ω Δ t) d ω = 2 \int_{0}^{+ ω_{0}} cos ((m - n) ω Δ t) d ω = \frac{2 sin((m - n) ω_{0} Δ t)}{(m - n) Δ t} = 2 ω_{0} sin c ((m - n) ω_{0} Δ t / π)$

si60_e (9.42)

Note that M is a symmetric N × N matrix. The window function, w, satisfies

$\begin{array}{c} maximize F = \sum_{m = 1}^{N} \sum_{n = 1}^{N} w_{n} w_{m} M_{n m} with the constraint C = \sum_{n = 1}^{N} w_{n}^{2} - 1 = 0 \\ or, equivalently \\ maximize F = w^{T} M w with the constraint C = w^{T} w - 1 = 0 \end{array}$ $\begin{array}{c} maximize F = \sum_{m = 1}^{N} \sum_{n = 1}^{N} w_{n} w_{m} M_{n m} with the constraint C = \sum_{n = 1}^{N} w_{n}^{2} - 1 = 0 \\ or, equivalently \\ maximize F = w^{T} M w with the constraint C = w^{T} w - 1 = 0 \end{array}$

si61_e (9.43)

The Method of Lagrange Multipliers (see Note 9.2) says that maximizing a function, F, with a constraint, C = 0, is equivalent to maximizing F − λC without a constraint, where λ is a new parameter that needs to be determined. Differentiating w^TMw − λ(w^Tw − 1) with respect to w and setting the result to zero leads to the equation

$M w = λ w$ $M w = λ w$

(9.44)

This is just the algebraic eigenvalue problem (see Equation 8.6). Recall that this equation has N solutions, each with an eigenvalue, λ_i, and a corresponding eigenvector, w⁽ⁱ⁾. The eigenvalues, λ_i, satisfy λ_i = w^(i)TMw⁽ⁱ⁾, as can be seen by pre-multiplying Equation (9.44) by w^T and recalling that the eigenvectors have unit length, w^Tw = 1. But w^TMw is the quantity, F, being maximized in Equation (9.43). Thus, the eigenvalues are a direct measure of the spikiness of the window functions. The best window function is equal to the eigenvector with the largest eigenvalue.

We illustrate the case of a 64-point window function with a width of ω₀ = 2Δω (Figures 9.22, 9.23 and 9.24). The six largest eigenvalues are 6.28, 6.27, 6.03, 4.54, 1.72, and 0.2. The first three eigenvalues are approximately equal in size, indicating that three different tapers come close to achieving the design goal of maximizing the spectral power in the ±ω₀ frequency range. The first of these, W₁(t), is similar in shape to a Normal curve, with high amplitudes in the center of the interval that taper off towards its ends. One possibility is to consider W₁(t) the best window function and to use it to compute power spectral density.

f09-22-9780128044889 — Figure 9.22 First four window functions, W_i(t), for the case N = 64, ω₀ = 2Δf. MatLab script eda09_21.

f09-23-9780128044889 — Figure 9.23 Amplitude spectral density, |W_i(f)|, of the first four window functions, for the case N = 64, ω₀ = 2Δω. The dotted vertical line marks frequency, 2Δf. MatLab script eda09_21.

f09-24-9780128044889 — Figure 9.24 (Row 1) Data, d(t), consisting of a cosine wave, and its amplitude spectral density (ASD). (Row 2) Data windowed with boxcar, B(t), and its ASD. (Rows 3–5) Data windowed with the first three window functions, and corresponding ASD. (Row 6). ASD obtained by averaging the results of the first three window functions. MatLab script eda09_21.

However, W₂(t) and W₃(t) are potentially useful, because they weight the data differently than does W₁(t). In particular, they leave intact data near the ends of the interval that W₁(t) strongly attenuates. Instead of using just the single window, W₁(t), in the computation of power spectral density, alternatively we could use several to compute several different estimates of power spectral density, and then average the results (Figure 9.24). This idea was put forward by Thomson (1982) and is called the multitaper method.

Problems

9.1 The ozone dataset also contains atmospheric temperature, a parameter, which like ozone, might be expected to lag solar radiation. Modify the eda09_05 script to estimate its lag. Does it have the same lag as ozone?

9.2 Suppose that the time series f and h are related by the convolution with the filter, s; that is, f = s*h. As the autocorrelation represents the covariance of a probability density function, the autocorrelation of f should be related to the autocorrelation of h by the normal rules of error propagation. Verify that this is the case by writing the convolution in matrix form, f = Sh, and using the rule C_f = SC_hS^T, where the Cs are covariance matrices.

9.3 Modify MatLab script eda09_03 to estimate the autocorrelation of the Reynolds Channel chlorophyll dataset. How quickly does the autocorrelation fall off with lag (for small lags)?

9.4 Taper the Neuse River Hydrograph data using a Hamming window function before computing its power spectral density. Compare your results to the untapered results, commenting on whether any conclusions about periodicities might change. (Note: before tapering, you should subtract the mean from the time series, so that it oscillates around zero).

9.5 Band-pass filter the Black Rock Forest temperature dataset to highlight diurnal variations of temperature. Provide a new answer to Question 2.3 that uses these results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9: Detecting correlations among data

Create new playlist

Sign In

Sign Up