6

Detecting periodicities

Abstract

Chapter 6, Detecting Periodicities, is about spectral analysis, the procedures used to represent data as a superposition of harmonically varying components and to detect periodicities. The key concept is the Fourier series, a type of linear model in which the data are represented by a mixture of sinusoidally varying components. The chapter aims to make the student completely comfortable with the discrete Fourier transform (DFT), the key algorithm used in studying periodicities. Theoretical analysis and a practical discussion of MatLab's DFT function are closely interwoven.

Keywords

sinusoidal; periodicity; amplitude; sampling theorem; Fourier series; Fourier transform; fast Fourier transform; Nyquist frequency; power spectral density; Dirac function; complex exponential

6.1 Describing sinusoidal oscillations

Sinusoidal oscillations—those involving sines and cosines—are very common in the environmental sciences. We encountered them in the Neuse River Hydrograph and the Black Rock Forest datasets, where they were associated with seasonal variations in river discharge and air temperature, respectively. This chapter examines oscillatory behavior in more detail, developing systematic methods for detecting and quantifying periodicities.

Periodicities can be both temporal and spatial in character, with a somewhat different nomenclature used for each. In both cases, the height of the oscillation is called its amplitude. Temporal periodicities have a period, T, the time between successive cycles. Spatial periodicities have a wavelength, λ, the distance between successive cycles. The rate at which temporal cycles occur is called the frequency and the rate at which spatial cycles occur is called wavenumber. Frequency can be measured in units of cycles per unit time, in which case it is given the symbol, f, or it can be measured in units of radians per unit time, in which case it is given the symbol, ω. The units of cycles per second are called Hertz, abbreviated Hz. Similarly, wavenumber can be measured in either cycles per unit distance or radians per unit distance. Unfortunately, both units of wavenumber tend to be given the same symbol, k, in the literature. In this book, we use k exclusively to mean radians per unit distance. These quantities are related as follows:

f=1T=ω2πand1λ=k2π

si10_e  (6.1)

Generic temporal, d(t), and spatial, d(x), cosine oscillations of amplitude, C, can be written as

d(t)=Ccos{2πtT}=Ccos{2πft}=Ccos{ωt}andd(x)=Ccos{2πxλ}=Ccos{kx}

si11_e  (6.2)

In nature, oscillations rarely “start” at time (or distance) zero. A cosine wave with amplitude, C, that starts (peaks) at time, t0, is given by (Figure 6.1)

d(t)=Ccos{2π(tt0)T}=Ccos{ω(tt0)}=Ccos{ωtϕ}

si12_e  (6.3)

f06-01-9780128044889
Figure 6.1 The sinusoidal function, d(t) = Ccos{2π(t − t0)/T}, has amplitude, C = 2, period, T = 50, and delay, t0 = 10. MatLab script eda06_01.

The quantity, ϕ = ωt0, is called the phase. The rule,

cos(ab)=cos(a)cos(b)+sin(a)sin(b)

si13_e  (6.4)

when applied to Equation (6.3), yields

d(t)=Ccos{ω(tt0)}=Ccos(ωt0)cos(ωt)+Csin(ωt0)sin(ωt)=Acos(ωt)+Bsin(ωt)withA=Ccos(ωt0)andB=Ccos(ωt0)andC=A2+B2andt0=ω1tan1(B/A)

si14_e  (6.5)

See Note 6.1 for a discussion of how the arc-tangent is to be correctly computed. A time-shifted cosine can be represented as the sum of a sine and a cosine. An explicit time-shift variable such as t0 is unnecessary in formulas describing periodicities, as long as sines and cosines are paired up with one another.

6.2 Models composed only of sinusoidal functions

Suppose that we have a dataset in which a variable, such as temperature, is sampled at evenly spaced intervals of time, ti (a time series), say with sampling interval, Δt. One extreme is a dataset composed of only sinusoidal oscillations:

the data=sum of sines and cosinesd(ti)=A1cos(ω1ti)+B1sin(ω1ti)+A2cos(ω2t)+B2sin(ω2t)+

si15_e  (6.6)

This formula is sometimes referred to as a Fourier series or an inverse discrete Fourier transform and the column vector containing the As and Bs is called the discrete Fourier transform (DFT) of d. The As and Bs are the model parameters and the corresponding frequencies are taken to be auxiliary variables. Note that sines and cosines of a given frequency, ωi, are paired, as was discussed in the previous section. Two key questions involve the number of frequencies that ought to be included in this representation and what their values should be. The answers to these questions involve a surprising fact about time series:

frequencies higher thanfny=12Δtcannot be detected

si16_e  (6.7)

This is called Nyquist's Sampling Theorem. It says that the periods shorter than two time increments cannot be detected in time series with evenly spaced samples. Furthermore, as we will see below, any frequencies in the data that are higher than the Nyquist frequency, fny, are erroneously mapped into the (0, fny) frequency range. Choosing the frequency range (0, fny) in Equation (6.5) is, therefore, natural. For simplicity, we assume that the number N of data is an even integer. It can always be made so by shortening an odd-length time series by one point. Then the number of frequencies is N/2+1si17_e and the number M of unknown coefficients of cosines and sines exactly equals the number N of data; that is M=Nsi18_e. These choices imply

fny=12Δtandωny=πΔtandΔω=2πNΔtandΔf=1NΔt

si19_e  (6.8)

The reason that this relationship calls for ½N + 1, as contrasted to ½N of them, is that the first and last sine term is identically zero; that is,

B1sin(0ti)=0andB(N/2)+1sin(2π2Δtti)=B(N/2)+1sin(nπ)=0

si20_e  (6.9)

where, n = tit, is an integer. Hence, these two terms are omitted from the sum. Thus, the Fourier series contains the frequencies [0, Δf,f, …, ½NΔf]T. The lowest frequency is zero. It corresponds to a cosine of zero period, that is, to a constant. The next highest frequency is Δf. It corresponds to a sine and a cosine that have one complete oscillation over the length of the data. The next highest frequency is 2Δf. It corresponds to a sine and a cosine that have two complete oscillations over the length of the data. The highest frequency is fny = ½NΔf = 1/(2Δt). It corresponds to a highly oscillatory cosine that reverses sign at every sample; that is [1, −1, 1, −1, ]T (Figure 6.2).

f06-02-9780128044889
Figure 6.2 Plots of columns of the matrix, G. with rows indicated by solid circles. (A) First column, the constant 1. (B, C) Next two columns are, cos(Δωt) and sin(Δωt), respectively. They have one period of oscillation over the time interval of the data. (D, E) Next two columns are cos(2Δωt) and sin(2Δωt), respectively. They have two periods of oscillation over the time interval of the data. F) Last column switches between 1 and 1 every row. MatLab script eda06_02.

In MatLab, frequency-related quantities are defined as follows:

Nf=N/2+1;
fmax = 1/(2*Dt);
Df = fmax/(N/2);
f = Df*[0:Nf−1]′;
Nw=Nf;
wmax = 2*pi*fmax;
Dw = wmax/(N/2);
w = Dw*[0:Nw−1]′

(MatLab script eda06_02)

Here, Dt is the sampling interval and N is the length of the data. N is assumed to be an even integer.

The problem that arises with frequencies higher than the Nyquist frequency can be seen by examining a pair of cosines and sines from Equation (6.6), written out for a particular time, tk, and frequency, ωn.

cos(ωntk)=cos((n1)(k1)ΔωΔt)=cos(2π(n1)(k1)N)sin(ωntk)=sin((n1)(k1)ΔωΔt)=sin(2π(n1)(k1)N)

si21_e  (6.10)

Here, we have used the rule, ΔωΔt = 2π/N. Note that we index time and frequency so that t1 and ω1 correspond to zero time and frequency, respectively; that is, tk = (k−1)Δt and ωn = (n−1)Δω. Now let us examine a frequency, ωm, that is higher than the Nyquist frequency, say m = n + N, where N is the number of data:

cos(ωmtk)=cos((n1+N)(k1)ΔωΔt)=cos(2π(n1+N)(k1)N)=cos(2π(n1)(k1)N)cos(2π(k1))-sin(2π(n1)(k1)N)sin(2π(k1))=cos(2π(n1)(k1)N)+0sin(ωmtk)=sin((n1+N)(k1)ΔωΔt)=sin(2π(n1+N)(k1)N)=sin(2π(n1)(k1)N)cos(π(k1))+cos(2π(n1)(k1)N)sin(π(k1))=sin(2π(n1)(k1)N)+0

si22_e  (6.11)

Here, we have used the rules, cos(a + b) = cos(a)cos(b) − sin(a)sin(b) and sin(a + b) = sin(a)cos(b) + cos(b)sin(a), together with the rule that cos(2π(k + 1)) = 1 and sin(2π(k + 1)) = 0, for any integer, k. Comparing Equations (6.10) and (6.11), we conclude

cos(ωntk)=cos(ωmtk)andsin(ωntk)=sin(ωmtk)

si23_e  (6.12)

The frequencies, ωn+N and ωn, are equivalent, in the sense that they yield exactly the same sinusoids (Figure 6.3). A similar calculation, omitted here, shows that ωN−n and −ωn are equivalent in this same sense. But sines and cosines with negative frequencies have the same shape, up to a sign, as those with corresponding positive frequencies; that is, cos(−ωt) = cos(ωt) and sin(−ωt) = −sin(ωt). Thus, only sinusoids in the frequencies in the range ω1 (zero frequency) to ωN/2+1 (the Nyquist frequency) have unique shapes (Figure 6.3C). In a digital world, no frequencies are higher than the Nyquist.

f06-03-9780128044889
Figure 6.3 Example of aliasing. (A) Low-frequency oscillation, d1(t) = cos(ω1t), with ω1 = 2Δω, evaluated every Δt (circles). Bottom) High-frequency oscillation, d2(t) = cos{ω2t}, with ω2 = (2 + Nω, evaluated every Δt (circles). Note that both the true curve (dashed) and low-frequency curve (solid) pass through the data points. (C) Schematic representation of aliasing, showing pairs of points on the frequency-axis that are equivalent. MatLab script eda06_03.

This limitation implies that special care must be taken when observing a time series to avoid recording any frequencies that are higher than the Nyquist frequency. This goal is usually achieved by creating a data recording system that attenuates—or eliminates—high frequencies before the data are converted into digital samples. If any high frequencies persist, then the dataset is said to be aliased, and spurious low frequencies will appear.

The linear equation, d = Gm, form of the Fourier series is as follows:

the data=sum of sines and cosines[d1d2d3dN]=[1cos(ω2t1)sin(ω2t1)cos(ωN/2t1)sin(ωN/2t1)11cos(ω2t2)sin(ω2t2)cos(ωN/2t2)sin(ωN/2t2)11cos(ω2t3)sin(ω2t3)cos(ωN/2t3)sin(ωN/2t3)111cos(ω2tN)sin(ω2tN)cos(ωN/2tN)sin(ωN/2tN)1][A1A2B2AN/2BN/2A(N/2)+1]

si24_e  (6.13)

In MatLab, the data kernel, G, is created as follows:

% set up G
G=zeros(N,M);
% zero frequency column
G(:,1)=1;
% interior M/2−1 columns
for i = [1:M/2−1]
 j = 2*i;
 k = j+1;
 G(:,j)=cos(w(i+1).*t);
 G(:,k)=sin(w(i+1).*t);
end
% nyquist column
G(:,M)=cos(w(Nw).*t);

(MatLab eda06_02)

Here, the number of model parameters, M, equals the number of data, N, and the frequency values are in a column vector, w.

Remarkably, the matrices, [GTG] and [GTG]−1, which play such an important role in the least-squares solution, can be shown to be diagonal:

[GTG]=Ndiag(1,½,½,,½,½,1)and[GTG]1=1Ndiag(1,2,2,,2,2,1)

si25_e  (6.14)

Note that we have used the rule that the inverse of a diagonal matrix, M, is the matrix of reciprocals of the elements of the main diagonal; that is [M−1]ii = 1/Mii. Equation (6.14) can be understood by noting that the elements of GTG are just the dot products of the columns, c, of the matrix, G. Many of these dot products are clearly zero (meaning that [GTG]ij = c(i)Tc(j) = 0 when ij). For instance, the second column, cos(Δωt), is symmetric about its midpoint, while the third column, sin(Δωt), is antisymmetric about it, so their dot product is necessarily zero (see Figure 6.2B and C). What is not so clear—but nevertheless true—is that every column is orthogonal to every other. We will not derive this result here, for it requires rather nitty-gritty trigonometric manipulations (but see Note 6.2). Its consequence is that the least-squares solution, m = [GTG]−1GTd requires only matrix multiplication, and not matrix inversion. Furthermore, when the data are uncorrelated, then the model parameters, m, which represent the amplitudes of the sines and cosines, are also uncorrelated, as Cm = σ2d[GTG]−1 is diagonal. Furthermore, all but the first and last model parameters have equal variance.

In MatLab, the least-squares solution is computed as follows:

gtgi = 2* ones(M,1)/N;
gtgi(1)=1/N;
gtgi(M)=1/N;
mest=gtgi .* (G'*d);

(MatLab eda06_04)

The column vectors,

[A12,A22+B22,,AN/22+BN/22,AN/2+12]Tand[A12,A22+B22,,AN/22+BN/22,AN/2+12]T

si26_e  (6.15)

are called the power spectral density and amplitude spectral density of the time series, respectively. Either can be used to quantify the overall amount of oscillations at any given frequency, irrespective of the phase. In MatLab, the power spectral density, s, is computed from mest as follows:

% zero frequency
s=zeros(Nw,1);
s(:,1)=mest(1)^2;
% interior points
for i = [1:M/2−1]
 j = 2*i;
 k = j+1;
 s(i+1) = mest(j)○2 + mest(k)^2;
end
% Nyquist frequency
s(Nw) = mest(M)^2;

(MatLab eda06_04)

When applied to the Neuse River discharge data, this method produces an amplitude spectral density that is largest at low frequencies (Figure 6.4A). In such cases, a graph with a logarithmic frequency axis (Figure 6.4B) allows one to see details that are squeezed near the origin of a linear plot (Figure 6.4A). In some applications, plotting amplitude spectral density as a function of period, as contrasted to frequency, may be preferable (as in this case, where the annual period of 365.2 days is more recognizable than the equivalent frequency of 0.002738 cycles/day). The amplitude spectral density consists of several peaks, superimposed on a “noisy background” that gradually declines with period. The three largest peaks have periods of 365.2, 182.6, and 60.2, (one, one-half, and one-sixth years) and are associated with oscillations in stream flow caused by seasonal fluctuations in rainfall.

f06-04-9780128044889
Figure 6.4 Amplitude spectral density of the Neuse River discharge dataset. (A) Plotted as a function on frequency (linear frequency axis). (B) Plotted as a function on frequency (logarithmic frequency axis). (C) Plotted as a function of period (linear horizontal axis). (D) Plotted as a function of period (logarithmic horizontal axis) MatLab script eda06_04.

A common practice is to omit the zero-frequency element of the spectral density from the plots (as was done here). It only reflects the mean value of the time series and is not really relevant to the issue of periodicities. Large values can require a vertical scaling that obscures the rest of the plot.

The MatLab script, eda06_04, worked fine calculating the spectral density of an N ≈ 4000 length time series. However, the N × N matrix, G, will become prohibitively large for larger datasets. Fortunately, a very efficient algorithm, called the fast Fourier transform (FFT), has been discovered for solving for mest that requires much less storage and much less computation than “brute force” multiplication by GT. We will return to this issue later in the chapter.

6.3 Going complex

Many of the formulas of the previous section can be substantially simplified by switching from sines and cosines to complex exponentials, utilizing Euler's formulas:

exp(iωt)=cos(ωt)+isin(ωt)exp(iωt)=cos(ωt)isin(ωt)

si27_e  (6.16)

Here, i is an imaginary unit. These formulas imply

cos(ωt)=exp(iωt)+exp(iωt)2andsin(ωt)=exp(iωt)exp(iωt)2i

si28_e  (6.17)

The main complication (besides the need to use complex numbers) is that both positive and negative frequencies are needed in the Fourier series. Previously, we paired sines and cosines; now we will pair complex exponentials with positive and negative frequencies:

d(t)=Acos(ωt)+Bsin(ωt)=Cexp(iωt)+C+exp(iωt)

si29_e  (6.18)

Here, C and C+ are the coefficients of the negative-frequency and positive-frequency terms, respectively. The requirement that these two different representations be equal implies a relationship between the As and Bs and the Cs. We write out the Cs in terms of their real and imaginary parts, C = CR + iCI and C+ = CR+ + iCI+ and perform the multiplication explicitly:

Cexp(iωt)+C+exp(iωt)=(CR+iCI){cos(ωt)isin(ωt)}+(CR++iCI+){cos(ωt)+isin(ωt)}=(CR+CR+)cos(ωt)+(CICI+)sin(ωt)+i(CI+CI+)cos(ωt)+i(CR+CR)sin(ωt)

si30_e  (6.19)

As the time series, d(t), is real, the two imaginary terms must be zero. This happens when C and C+ are complex conjugates of each other: C = CR iCI and C+ = CR + iCI. Comparing Equations (6.17) and (6.18), we find

A=(CR+CR+)=2CR+andB=(CICI+)=2CI+

si31_e  (6.20)

The power spectral density is now computed from CR2+CI2si32_e, which is equivalent to C times its complex conjugate: CR2+CI2=|C|2=C*Csi33_e. Here, the asterisk means complex conjugation.

We can represent a real function as a Fourier series that involves sines and cosines of real amplitudes, A and B, respectively, or alternatively, as a Fourier series that involves complex exponentials with complex coefficients, C:

dk=n=1N2+1{Ancos(ωntk)+Bnsin(ωntk)}withωn=(0,Δω,2Δω,,½NΔω)

si34_e  (6.21a)

dk=1Nn=1NCnexp(iωntk)withωn=(0,Δω,2Δω,,½NΔω,(½N1)Δω,,2Δω,Δω)

si35_e  (6.21b)

Note the nonintuitive ordering of the frequencies in the summation of the complex exponentials, which we will discuss in detail below. The factor of N1 has been added to the complex summation in order to match MatLab's convention so that now (2/N)Ci = Ai − iBi.

The solution of Equation (6.21b) for the coefficients, Cn, requires the complex version of least squares (see Note 4.1). We will not discuss it in any detail here, except to note that the matrix, [GHG]−1 = N1I, is diagonal (see Note 6.2) so that, like the sine and cosine version of the DFT, matrix inversion is not required. The complex coefficients, Cj, are calculated from the time series, d(tn), by

Cj=n=1Nd(tn)exp(iωjtn)

si36_e  (6.22)

In the sine and cosine case, the sum has N/2 + 1 pairs of sines and cosines, but the coefficients of the first and last pairs are zero, so the total number of unknowns is N. In the complex exponential case, the sum has N complex coefficients, but except for the first and last, they occur in complex conjugate pairs, so only the N/2 + 1 coefficients of the nonnegative frequencies count for the unknowns. Each of these has a real and imaginary part, except for the first and the last, which are purely real, so the number of unknowns is once again 2(N/2 + 1) 2 = N. Note that if the data were complex, the coefficients would all be different; that is, N complex coefficients are needed to represent N complex data.

We return now to the issue of the ordering of the frequencies, which is to say, the order of the model parameters, m. The ordering presented in Equation (6.21b) has the nonnegative frequencies first and the negative frequencies last. This ordering, while nonintuitive, is actually more useful than a strictly ascending ordering, because the nonnegative frequencies are really the only ones needed when dealing with real time series. The negative frequencies, being complex conjugates of the positive ones, are redundant. However, the ordering has a more subtle rationale related to aliasing. The negative frequencies correspond exactly to the positive frequencies that one would get, if the positive ordering were extended past the Nyquist frequency. For example, the last frequency, −Δω, is exactly equivalent to +(N−1)Δω, in the sense that both correspond to the same complex exponential.

In MatLab, the arrays of frequencies are created as follows:

M=N;
tmax=Dt*(N−1);
t=Dt*[0:N−1]′;
fmax=1/(2.0*Dt);
df=fmax/(N/2);
f=df*[0:N/2,−N/2+1:−1]′;
Nf=N/2+1;
dw=2*pi*df;
w=dw*[0:N/2,−N/2+1:−1]′;
Nw=Nf;

(MatLab eda06_05)

Here, N is the length of the data, again presumed to be an even integer. The quantities, Nf and Nw, are the numbers of nonnegative frequencies. MatLab's fft() (for fast Fourier transform) function solves for the complex Fourier coefficients very efficiently, and should always be used in preference to the least-squares procedure. The amplitude spectral density is computed as follows:

% compute Fourier coefficients
mest = fft(d);
% compute amplitude spectral density
s=abs(mest(1:Nw));

(MatLab eda06_05)

Note that the amplitude spectral density is computed from the complex absolute value of the coefficients, C. The results are, of course, identical to least-squares, but they are computed with orders of magnitude less time and storage requirements (as can be seen by comparing the run times of scripts eda06_04 and eda06_05). The time series can be rebuilt from its Fourier coefficients by using the ifft() function:

dnew = ifft(mest);

Crib Sheet 6.1

Experimental design for spectra

Questions to ask yourself

What is the highest frequency fhigh that you need to detect?

What is the minimum frequency separation δf of spectral peaks

that you need to resolve?

How much data N can you record and analyze?

Important quantities

duration of recording, T

sampling interval, Δtsi1_e

number of samples (amount of data), N=T/Δtsi2_e

Nyquist (maximum) frequency, fmax=1/2Δtsi3_e

frequency spacing, Δf=fmax/N/2si4_e

Design principles

The Nyquist frequency must be greater than the highest frequency you need to detect, and preferably at least twice as high

fmax2fhigh

si5_e

The frequency sampling must be smaller than the minimum frequency separation you need to resolve, and preferably ten times smaller

Δfδf/10

si6_e

However, you need to verify that you have the ability to record and analyze the corresponding amount of data

Design formulas

Δt=1/2fmax1/fhigh

si7_e

N=2fmax/Δf40fhigh/δf

si8_e

T=NΔt40/δf

si9_e

(MatLab eda06_05)

6.4 Lessons learned from the integral transform

The Fourier series is very closely related to the Fourier transform, as can be seen by a side-by-side comparison of the two:

d~(ω)=+d(t)exp(iωt)dtandC(ωj)=k=1Nd(tk)exp(iωjtk)

si37_e  (6.23)

The integral converts the function, d(t), into its Fourier transform, the function, d~(ω)si38_e. Similarly, the summation converts a time series, d(tj), into its DFT, the column vector, C(ωj). If we assume that the data are nonzero only between 0 and tmax, then the integral can be approximated by its Riemann sum:

0tmaxd(t)exp(iωjt)dtΔtk=1Nd(tk)exp(iωjtk)

si39_e  (6.24)

Comparing Equations (6.22) and (6.23), we deduce that d~(ωj)ΔtC(ωj)si40_e, that is, the Fourier transform and DFT coefficients differ only by the constant, Δt. A similar relationship holds between the inverse transform and the Fourier summation as well:

d(tj)=12π+d~(ω)exp(iωtj)dωΔω2πk=1Nd~(ωk)exp(iωktj)

si41_e  (6.25)

This again gives d~(ωj)ΔtC(ωj)si40_e (as ΔωΔt = 2π/N).

In the context of this book, Fourier transforms are of interest because they can be more readily manipulated than Fourier series. Many of the relationships that are true for Fourier transforms will also be true—or approximately true—for Fourier series. We summarize a few of the most useful relationships below.

6.5 Normal curve

The transform of the Normal function, d(t) = exp(−a2t2), is

d~(ω)=+exp(a2t2)exp(iωt)dt=20+exp(a2t2)cos(ωt)dt=πaexp(ω24a2)

si43_e  (6.26)

Here, we have expanded exp(ωt) into cos(ωt) + isin(ωt). The integrand, sin(ωt)exp(−a2t2), consists of an odd function (the sine) of time multiplied by an even function (the exponential) of time. It is, therefore, odd and so its integral over (, +∞) is zero. The product, cos(ωt)exp(−a2t2), is an even function, so its integral on (, +∞) is twice its integral on (0, ∞). A standard table of integrals (e.g., integral 679 of the CRC Standard Mathematical Tables) is used to evaluate the final integral.

If we write a2 = 1/2σ2, then the exponential has the form of a Normal curve centered at time zero:

d(t)=12πσexp(t22σ2)andd~(ω)=exp(ω22σ2)

si44_e  (6.27)

Thus, up to an overall normalization, the transform of a Normal curve with variance, σ2, is a Normal curve with variance, σ2.

This result is extremely important because it quantifies how the widths of functions and the widths of their Fourier transforms are related. When d(t) is a wide function, d~(ω)si38_e is narrow, and vice-versa. A spiky function, such as a narrow Normal curve, has a transform that is rich in high frequencies. A very smooth function, such as a wide Normal curve, has a transform that lacks high frequencies (Figure 6.5).

f06-05-9780128044889
Figure 6.5 (A) Shaded column-vectors of a series of Normal functions with increasing variance. (B) Corresponding amplitude spectral density MatLab script eda06_06.

6.6 Spikes

The relationship between the width of a function and its Fourier transform can be pursued further by defining the Dirac delta function—a Normal curve in the limit of vanishing small variance:

δ(tt0)=limσ012πσexp((tt0)22σ2)

si46_e  (6.28)

This generalized function is zero everywhere, except at the point, t0, where it is singular. Nevertheless, it has unit area. When the product, δ(t − t0)f(t), is integrated, the result is just f(t) evaluated at the point, t = t0:

+δ(tt0)f(t)dt=f(t0)

si47_e  (6.29)

This result can be understood by noting that the Dirac function is nonzero only in a vanishingly small interval of time, t0. Within this interval, the function, f(t), is just the constant, f(t0). No error is introduced by replacing f(t) with f(t0) everywhere and taking it outside the integral, which then integrates to unity.

The Fourier transform of a spike at t0 = 0 is unity (Figure 6.6):

d~(ω)=+δ(t)exp(iωt)dt=exp(iωt)|t=0=1

si48_e  (6.30)

f06-06-9780128044889
Figure 6.6 (A) Spike function is zero except at time, t = 0. (B) Corresponding transform is unity. MatLab script eda06_07.

This is the limiting case of an indefinitely narrow Normal function (see Equation 6.27).

The Dirac function, being “infinitely spiky,” has a transform that contains all frequencies in equal proportions. The transform with a spike at t = t0, is

d~(ω)=+δ(tt0)exp(iωt)dt=exp(iωt)|t=t0=exp(iωt0)

si49_e  (6.31)

Although it is an oscillatory function of time, its power spectral density is constant: s(ω)=d~*(ω)d~(ω)=exp(+iωt0)exp(iωt0)=1si50_e.

The Dirac function can appear in a function's Fourier transform as well. The transform of cos(ω0t) is

d~(ω)=π(δ(ωω0)+δ(ω+ω0))

si51_e  (6.32)

This formula can be verified by inserting it into the inverse transform:

d(t)=12π+π(δ(ωω0)+δ(ω+ω0))exp(iωt)dω=exp(iω0t)+exp(iω0t)2=cos(ω0t)

si52_e  (6.33)

As one might expect, the transform of the pure sinusoid, cos(ω0t), contains only two frequencies, ±ω0.

6.7 Area under a function

The area, A, under a function, d(t), is its Fourier transform evaluated at zero frequency; that is, A=d~(ω=0)si53_e:

d~(ω=0)=+d(t)exp(0)dt=+d(t)dt=A

si54_e  (6.34)

as exp(0) = 1. In MatLab, the area is computed as (Figure 6.7)

f06-07-9780128044889
Figure 6.7 The area under the exemplary function, d(t), is computed by the sum() and fft() functions. Both give the same value, 2.0014. MatLab script eda06_08.
dt=fft(d);
area = real(dt(1));

(MatLab eda06_08)

6.8 Time-delayed function

Multiplying the transform, d~(ω)si38_e, by the factor, exp(−iωt0), time-delays the function by a time interval, t0:

d~delayed(ω)=+d(tt0)exp(iωt)dt=+d(t')exp(iω(t'+t0))dt'=exp(iωt0)+d(t')exp(iωt')dt'=exp(iωt0)d~(ω)

si56_e  (6.35)

Here, we use the transformation of variables, t′ = t − t0, noting that dt′ = dt and that t′→ ± ∞ as t′→ ± ∞. In the literature, the process of modifying a Fourier transform by multiplication with a factor, exp(iωt0)si57_e, is sometimes referred to as introducing a phase ramp, as it changes the phase by an amount proportional to frequency (i.e., by a ramp-shaped function):

φ(ω)=ωt0

si58_e  (6.36)

The time-shift result appeared previously, when we were calculating the transform of a time-delayed spike (Equation 6.31). The transform of the time-shifted spike differed from the transform of a spike at time zero by a factor of exp(−iωt0). In MatLab, the time delay is accomplished as follows (Figure 6.8).:

f06-08-9780128044889
Figure 6.8 The exemplary function, d(t), is time shifted by an interval, t0. MatLab script eda06_09.
t0 = t(16);
ds=ifft(exp(−i*w*t0).*fft(d));

(MatLab eda06_09)

where t0 is the delay. Note that the symbol i is being used as the imaginary unit. This is the MatLab default, but one must be careful not to reset its value to something else, for example, by using it as a counter in a for loop (see Note 1.1).

6.9 Derivative of a function

Multiplying the transform, d~(ω)si38_e, by the factor, ιω, computes the transform of the derivative, dd/dt:

+dddtexp(iωt)dt=d(t)exp(iωt)|+(iω)+d(t)exp(iωt)dt=0+(iω)d~(ω)=iωd~(ω)

si60_e  (6.37)

Here, we have used integrations by parts, ∫udv = uv−vdu, together with the limit, exp(−iωt) → 0 as t → ±∞. In MatLab, the derivative can be computed as follows (Figure 6.9):

f06-09-9780128044889
Figure 6.9 (A) exemplary function, d(t). (B) Derivative of d(t) as computed by finite differences. (C) Derivative as computed by fft(). MatLab script eda06_10.

dddt = ifft(i*w.*fft(d));

(MatLab eda06_10)

6.10 Integral of a function

Dividing the transform, d~(ω)si38_e, by the factor, ιω, computes the transform of the integral, ∫d(t) dt:

+0td(t')dt'exp(iωt)dt=0td(t')dt'exp(iωt)iω|+1iω+d(t)exp(iωt)dt=0+1iωd~(ω)=1iωd~(ω)

si62_e  (6.38)

Here, we have used integrations by parts, ∫udv = uv−vdu together with the limit, exp(−iωt) → 0 as t→ ± ∞. Note, however, that the zero-frequency value is undefined (as 1/0 is undefined). As shown in Equation (6.34), this value is the total area under the curve, so it functions as integration constant and must be set to an appropriate value. In MatLab, with the zero-frequency value set to zero, the integral is calculated as follows (Figure 6.10):

f06-10-9780128044889
Figure 6.10 (A) exemplary function, d(t). (B) Integral of d(t) as computed by Riemann sums. (C) Integral as computed by fft(). Notice that the two integrals differ by a constant offset. MatLab script eda06_11.

integral=ifft(−i*fft(d).*[0,1./w(2:N)′]′);

(MatLab eda06_11)

6.11 Convolution

Finally, we derive a result that pertains to the convolution operation, which will be developed further and utilized heavily in subsequent chapters. Given two functions, f(t) and g(t), their convolution (written f*g) is defined as

f(t)*g(t)=+f(τ)g(tτ)dτ

si63_e  (6.39)

The transform of the convolution is

+f(τ)g(tτ)dτexp(iωt)dt==f(τ)+g(tτ)exp(iωt)dtdτ=f(τ)+g(t')exp(iω(t'+τ))dt'dτ=f(τ)exp(iωτ)dτ+g(t')exp(iωt')dt'=f~(ω)g~(ω)

si64_e  (6.40)

Here, we have used the transformation of variables, t′ = t − τ. Thus, the transform of the convolution of two functions is the product of their transforms.

Crib Sheet 6.2

Exemplary pairs of time series d(t) and its amplitude spectral density s(f) (MatLab script eda06_12)

u06-01-9780128044889u06-02-9780128044889

6.12 Nontransient signals

Previously, in developing the relationship between the Fourier integral and its discrete analog, we assumed that the function of interest, d(t), was zero outside of the time window of observation. Only transient signals have this property; we can theoretically record the whole phenomenon, as it lasts only for a finite time. An equally common scenario is one in which the data represent just a portion of an indefinitely long phenomenon that has no well-defined beginning or end. Both the Neuse River Hydrograph and Black Rock Forest temperature datasets are of this type.

Many nontransient signals do not vary dramatically in overall pattern from one time window of observation to another (meaning that their statistical properties are stationary; that is, constant with time). One parameter that is independent of the window length is the power, P:

P=1T[d(t)]2dtΔtNΔti=1N[di(t)]2=1NdTd

si65_e  (6.41)

In some cases, such as when d(t) represents velocity, P literally is power, that is energy per unit time. Usually, however, the word is understood in the more abstract sense to mean the overall size of a signal that is fluctuating about zero. Note that when the data have zero mean, P = N1dTd is the formula for the variance of d. In this special case, the power is equivalent to the variance of the signal, d(t).

The power in a time series has a close relationship to its power spectral density. A time series is related to its Fourier transform by the linear rule, d = Gm, where m is a column vector of Fourier amplitudes. If we were to use the sines and cosines representation of the Fourier transform, then the matrix, G, is given in Equation (6.13). Substituting d = Gm into Equation (6.41) yields

P=1NdTd=1N(Gm)T(Gm)=1NmT[GTG]m

si66_e  (6.42)

However, according to Equation (6.14), GTG = (N/2)I (except for the first and last coefficient, which we shall ignore). Hence,

P=1NN2mTm=12i=1N2+1(Ai2+Bi2)=12i=1N2+14N2|Ci|2=2(Δt)2N2Δf0fny|d~(f)|2df

si67_e  (6.43)

This result is called Parseval's Theorem. Here, the As and Bs are the coefficients in the cosines and sines representation of the Fourier series (Equation 6.13) and the Cs are the coefficients of the MatLab's version of the DFT (Equation 6.21b). The two representations are related by (2/N)Ci = Ai − iBi. The Fourier transform is approximately, d~(f)=ΔtCisi68_e. If we define the power spectral density, s2(f), of a nontransient signal as

s2(f)=2T|d~(f)|2thenP=0fnys2(f)df

si69_e  (6.44)

Here, we have used the relations T = NΔt and ΔtΔf = N1 to reduce 2/[(Δt)2N2Δf] to 2/T. The power is the integral of the power spectral density from zero frequency to the Nyquist frequency. In MatLab, the power spectral density is computed as follows:

tmax=N*Dt;
C=fft(d);
dbar=Dt*C;
thepsd = (2/tmax) * abs(dbar(1:Nf)).○2;

(MatLab eda06_13)

and the total power (variance) is:

P = Df * sum(thepsd);

(MatLab eda06_13)

Here, the data, d, are presumed to have sampling interval, Dt, and length, N.

This simple generalization of the idea of spectral density is closely connected to the limitation that we cannot take the Fourier transform of the whole phenomenon, for it is indefinitely long, but only a portion of it. Nevertheless, we would like for our results to be relatively insensitive to the length of the time window. The factor of 1/T in Equation (6.44) normalizes for window length.

Suppose the units of d(t) are u. Then, the Fourier transform, d~(ω)si38_e, has units of u-s and the power spectral density has units of u2s2/s = u2s = u2/Hz. For example, for discharge data with units of cubic feet per second, u = ft3/s, and power spectral density has units of ft3, or ft3/s Hz−1 (in the Neuse river hydrograph shown in Figure 6.11, we use the mixed units of ft3/s per cycle/day).

f06-11-9780128044889
Figure 6.11 (A) Neuse River hydrograph, d(t). (B) Its power spectral density, s2(f). MatLab script eda06_14.

The amplitude spectral density has units of the square root of the power spectral density; that is, u Hz−1/2.

We have yet to discuss two important elements of working with spectra: First, we have made no mention of confidence limits, yet these are important in determining whether an observed periodicity (i.e., an observed spectral peak) is statistically significant. The reason we have omitted it so far is that the power spectral density is not a linear function of the model parameters, but instead depends on the sum of squares of the model parameters. We lack the tools to propagate error between the data and results in this case. Second, we have made no mention of the tapering process that is often used to prepare data before computing its Fourier transform. We will address these important issues in Chapters 12 and 9, respectively.

Problems

6.1. Write a MatLab script that uses the fft() function to differentiate the Neuse River Hydrograph dataset. Plot the results.

6.2. What is the Fourier transform of sin(ω0t)? Compare it to the transform of cos(ω0t).

6.3. The MatLab script, eda06_15, creates a file, noise.txt, containing normally distributed random time series, d(t), with zero mean and unit variance. (A) Compute and plot the power spectral density of this time series. (B) Create a second time series, a(t), that is a moving window average of d(t); that is, each point in a(t) is the average of, say L, neighboring points in d(t). (C) Compute and plot the power spectral density of a(t) for a suite of values of L. Comment on the results.

6.4. Suppose that you needed to compute the DFT of the function

d(t)=exp(t22σ2)

si71_e

using MatLab's fft() function. This function is centered on t = 0, and therefore has nonnegligible values for points to the left of the origin. Unfortunately, we have defined the time and data column-vectors, t and d, to start at time zero, so there seems to be no place to put these data values. One solution to this problem is to shift the function to the center of the time window, say by an amount, t0, compute its Fourier transform, and then multiply the transform by a phase factor, exp(iωt0), that shifts it back. Another solution relies on the fact that, in discrete transforms, both time and frequency suffer from aliasing. Just as the last frequencies in the transform were large positive frequencies and small negative frequencies, the last points in the time series are simultaneously

t=[(N3)Δt(N2)Δt(N1)Δt]Tandt=[3Δt2ΔtΔt]T

si72_e

Therefore, one simply puts the negative part of d(t) at the right-hand end of d. Write a MatLab script to try both methods and check that they agree.

6.5. Compute and plot the amplitude spectral density of a cleaned version of the Black Rock Forest temperature dataset. (A) What are its units? (B) Interpret the periods of any spectral peaks that you find.

References

Zwillinger D. CRC Standard Mathematical Tables and Formulae. 30th ed. CRC Press; 1996.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.216.254