Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2

Signal and System as Vectors

To solve forward and inverse problems, we need mathematical tools to formulate them. Since it is convenient to use vectors to represent multiple system parameters, variables, inputs (or excitations) and outputs (or measurements), we first study vector spaces. Then, we introduce vector calculus to express interrelations among them based on the underlying physical principles. To solve a nonlinear inverse problem, we often linearize an associated forward problem to utilize the numerous mathematical tools of the linear system. After introducing such an approximation method, we review mathematical techniques to deal with a linear system of equations and linear transformations.

2.1 Vector Spaces

We denote a signal, variable or system parameter as f(r, t), which is a function of position r = (x, y, z) and time t. To deal with a number n of signals, we adopt the vector notation (f₁(r, t), …, f_n(r, t)). We may also set vectors as (f(r₁, t), …, f(r_n, t)), (f(r, t₁), …, f(r, t_n)) and so on. We consider a set of all possible such vectors as a subset of a vector space. In the vector space framework, we can add and subtract vectors and multiply vectors by numbers. Establishing the concept of a subspace, we can project a vector into a subspace to extract core information or to eliminate unnecessary information. To analyze a vector, we may decompose it as a linear combination of basic elements, which we handle as a basis or coordinate of a subspace.

2.1.1 Vector Space and Subspace

Definition 2.1.1

A non-empty set V is a vector space over a field or if there are operations of vector addition and scalar multiplication with the following properties.

Vector addition

1. u + v ∈ V for every u, v ∈ V (closure).

2. u + v = v + u for every u, v ∈ V (commutative law).

3. (u + v) + w = u + (v + w) for every u, v, w ∈ V (associative law).

4. There exist 0 ∈ V such that u + 0 = u for every u ∈ V (additive identity).

5. For all u ∈ V there exists − u ∈ V such that u + ( − u) = 0 and − u is unique (additive inverse).

Scalar multiplication

1. For

and u ∈ V, au ∈ V (closure).

2. For

and u, v ∈ V, a(bu) = (ab)u (associative law).

3. For and u, v ∈ V, a(u + v) = au + av (first distributive law).

4. For and u ∈ V, (a + b)u = au + bu (second distributive law).

5. 1u = u for every u ∈ V (multiplicative identity).

A subset W of a vector space V over F is a subspace of V if and only if au + v ∈ W for all and for all u, v ∈ W. The subspace W itself is a vector space.

Example 2.1.2

The following are examples of vector spaces.

, n-dimensional Euclidean space.
.
C([a, b]), the set of all complex-valued functions that are continuous on the interval [a, b].
C¹([a, b]): = {f ∈ C([a, b]):f′ ∈ C[a, b]}, the set of all functions in C([a, b]) with continuous derivative on the interval [a, b].
, the set of all square-integrable functions on the open interval (a, b).
.

Definition 2.1.3

Let G = {u₁, …, u_n} be a subset of a vector space V over a field or . The set of all linear combinations of elements of G is denoted by span G:

images

The span G is the smallest subspace of V containing G. For example, if , then span{(1, 2, 3), (1, − 2, 0)} is the plane .

Definition 2.1.4

The elements u₁, …, u_n of a vector space V are said to be linearly independent if

images

Otherwise, u₁, …, u_n are linearly dependent.

If is linearly independent, no vector u_j can be expressed as a linear combination of other vectors in the set. If u₁ can be expressed as u₁ = a₂u₂ + ··· + a_nu_n, then a₁u₁ + a₂u₂ + a_nu_n = 0 with a₁ = − 1, so they are not linearly independent. For example, {(4, 1, 5), (2, 1, 3), (1, 0, 1)} is linearly dependent since − (4, 1, 5) + (2, 1, 3) + 2(1, 0, 1) = (0, 0, 0). The elements (2, 1, 3) and (1, 0, 1) are linearly independent because a₁(2, 1, 3) + a₂(1, 0, 1) = (0, 0, 0) implies a₁ = a₂ = 0.

Example 2.1.5

In Figure 2.1, the two vectors {u, v} are linearly independent whereas the four vectors {u, v, p, q} are linearly dependent. Note that p and q are linearly independent.

Figure 2.1 Linearly independent and dependent vectors

2.1.2 Basis, Norm and Inner Product

Definition 2.1.6

Let W be a subspace of a vector space V. If span{u₁, …, u_n} = W and {u₁, …, u_n} is linearly independent, then {u₁, …, u_n} is said to be a basis for W.

If is a basis for W, then any vector v ∈ W can be expressed uniquely as . If G′ is another basis of W, then G′ contains exactly the same number n of elements.

Definition 2.1.7

Let W be a subspace of a vector space V. Then W is n-dimensional if the number of elements of the basis of W is n; W is finite-dimensional if dim W < ∞; otherwise W is infinite-dimensional.

To quantify a measure of similarity or dissimilarity among vectors, we need to define the magnitude of a vector and the distance between vectors. We use the norm ||u|| of a vector u to define such a magnitude. In the area of topology, the metric is also used for defining a distance. To distinguish different vectors in a vector space, we define a measure of distance or metric between two vectors u and v as the norm ||u − v||. The norm must satisfy the following three rules.

Definition 2.1.8

A normed vector space V is a vector space equipped with a norm || · || that satisfies the following:

(triangle inequality).

Here, the notation stands for “for all” and iff stands for “if and only if”.

Example 2.1.9

Consider the vector space . For and 1 ≤ p < ∞, the p-norm of u is

2.1 2.26

In particular, is the standard distance between u and v. We should note that, when 0 < p < 1, ||u||_p is not a norm because it does not satisfy the triangle inequality.

Example 2.1.10

Consider the vector space V = C([0, 1]). For f, g ∈ V, the distance between f and g can be defined by

images

In addition to the distance between vectors, it is desirable to establish the concept of an angle between them. This requires the definition of an inner product.

Definition 2.1.11

Let V be a vector space over or . We denote the complex conjugate of by . A vector space V with a function is an inner product space if:

;

and

In general, 〈u, v 〉 is a complex number, but 〈u, u 〉 is real. Note that 〈w, au + bv 〉 = 〈w, u 〉 + 〈w, v 〉 and 〈u, 0 〉 = 0. If 〈u, v 〉 = 0 for all v ∈ V, then u = 0. Given any inner product, is a norm on V.

For a real inner product space V, the inner product provides angle information between two vectors u and v. We denote the angle θ between u and v as

We interpret the angle as follows.

1. If θ = 0, then 〈u, v 〉 = ||u|| ||v|| and v = au for some a > 0. The two vectors u, v are in the same direction.

2. If θ = π, then 〈u, v 〉 = − ||u|| ||v|| and v = au for some a < 0. The two vectors u, v are in opposite directions.

3. If θ = ± π/2, then 〈u, v 〉 = 0. The two vectors u, v are orthogonal.

Definition 2.1.12

The set {u₁, …, u_n} in an inner product space V is said to be an orthonormal set if 〈u_j, u_k 〉 = 0 for j ≠ k and ||u_j|| = 1. A basis is an orthonormal basis if it is an orthonormal set.

2.1.3 Hilbert Space

When we analyze a vector f in a vector space V having a basis , we wish to represent f as

images

Computation of the coefficients a_j could be very laborious when the vector space V is not equipped with an inner product and is not an orthonomal set. A Hilbert space is a closed vector space equipped with an inner product.

Definition 2.1.13

A vector space H over or is a Hilbert space if:

1. H is an inner product space;

2. H =

(H is a closed vector space), that is, whenever lim_n→∞||u_n − u|| = 0 for some sequence {u_n} ⊂ H, u belongs to H.

For u, v, w ∈ H, a Hilbert space, we have the following properties.

Cauchy–Schwarz inequality: |〈u, v 〉 | ≤ ||u|| ||v||.
Triangle inequality: ||u + v|| ≤ ||u|| + ||v||.
Parallelogram law: ||u + v||² + ||u − v||² = 2||u||² + 2||v||².
Polarization identity: 4〈u, v 〉 = ||u + v||² − ||u − v||² + i||u + iv||² − i||u − iv||².
Pythagorean theorem: ||u + v||² = ||u||² + ||v||² if 〈u, v 〉 = 0.

Exercise 2.1.14 (Gram–Schmidt process)

Let H be a Hilbert space with a basis {v₁, v₂, …}. Assume that {u₁, u₂, …} is obtained from the following procedure depicted in Figure 2.2:

1. Set w₁ = v₁ and u₁ = w₁/||w₁||;

2. Set w₂ = v₂ − 〈v₂, u₁ 〉 u₁ and u₂ = w₂/||w₂||;

3. For n = 2, 3, …,

images

Figure 2.2 Illustration of the Gram–Schmidt process

Prove that {u₁, u₂, …} is an orthonormal basis of H, that is,

Theorem 2.1.15 (Projection theorem)

Let G be a closed convex subset of a Hilbert space H. For every u ∈ H, there exists a unique u_* ∈ G such that ||u − u_*|| ≤ ||u − v|| for all v ∈ G.

For the proof of the above theorem, see Rudin (1970). Let S be a closed subspace of a Hilbert space H. We define the orthogonal complement S^⊥ of S as

According to the projection theorem, we can define a projection map P_S:H → S such that the value P_S(u) satisfies

This means that f(t) = ||u − P_S(u) + tv||² has its minimum at t = 0 for any v ∈ S and, therefore,

Hence, the projection theorem states that every u ∈ H can be uniquely decomposed as u = v + w with v ∈ S and w ∈ S^⊥ and we can express the Hilbert space H as

From the Pythagorean theorem,

Figure 2.3 illustrate the projection of a vector u onto a subspace S.

Figure 2.3 Illustration of a projection of a vector u onto a subspace S

Example 2.1.16 (Euclidean space

is a Hilbert space)

A Hilbert space is a generalization of the Euclidean space . For , we define the inner product and norm as

images

The distance between x and y is defined by ||x − y||, and ||x − y|| = 0 implies x = y. If 〈x, y 〉 = 0, then x and y are orthogonal and the following Pythagorean theorem holds:

If {e₁, e₂, …, e_n} is an orthonormal basis of , every can be represented uniquely by

images

For example, we let e₁ = (1, 0, …, 0), e₂ = (0, 1, 0, …, 0), …. If V_m = span{e₁, …, e_m} for m < n, the vector is

images

with the distance We can generalize this dot product property in Euclidean space to an infinite-dimensional Hilbert space.

Example 2.1.17 (L²-space)

Let I be the interval [0, 1]. We denote the set of all square-integrable complex functions by L²(I), that is,

For f, g ∈ L²(I), we define the inner product as

where denotes the complex conjugate of g(x). Dividing the interval [0, 1] into N subintervals with endpoints x₀ = 0, x₁ = Δx, …, x_N = NΔx = 1 and equal gap Δx = 1/N, the inner product 〈f, g 〉 in L²(I) can be viewed approximately as the inner product in Euclidean space :

images

The vector space L²(I) with the above inner product is a Hilbert space and retains features of Euclidean space. Indeed, in order for the vector space L²(I) to be a Hilbert space, we need Lebesgue measure theory because there are infinitely many f with ||f|| = 0 but f ≠ 0 in the pointwise sense. In Lebesgue measure theory, ||f|| = 0 implies f = 0 almost everywhere in the sense of the measure. In L²-space, f = g means that f = g almost everywhere. For the details of Lebesgue measure theory, please refer to Rudin (1970).

Exercise 2.1.18

Prove that

is an orthonormal set in L²([0, 1]).

2.2 Vector Calculus

Based on the underlying physical principles, we need to express interrelations among system parameters, variables, excitations and measurements. A scalar-valued function f(x) defined in represents a numerical quantity at a point . Examples may include temperature, voltage, pressure, altitude and so on. A vector-valued function F(x) is a vector quantity at . It represents a vector field such as displacement, velocity, force, electric field intensity, magnetic flux density and so on. We now review the vector calculus of gradient, divergence and curl to handle basic dynamics among variables and parameters.

2.2.1 Gradient

Let . For a given unit vector , the directional derivative of f at x in the direction b is denoted by ∂_bf(x). It represents the rate of increase in f at x in the direction of b:

If b = e_j, the jth unit vector in the Cartesian coordinate system, we simply write . The gradient of f, denoted as ∇f, is a vector-valued function that points in the direction of maximum increase of f:

Remark 2.2.1

Let . Suppose that f(x₀) = λ and ∇f(x₀) ≠ 0. The vector ∇f(x₀) is perpendicular to the level set because there is no increase in f along the level set _λ. If is a domain enclosed by the level surface _λ, the unit outward normal vector n(x₀) at x₀ on the boundary ∂Ω_λ is

which points in the steepest ascending direction. The curvature along the level set _λ is given by

Proof.

is a smooth curve lying on a level surface

then

The vector

is perpendicular to the tangent direction

of the level set

_λ. Since

has the same direction as the gradient ∇f(x), we can write

Exercise 2.2.2

Prove that ∇f = (∂₁f, …, ∂_nf).

2.2.2 Divergence

The divergence of F(r) at a point r, written as div F, is the net outward flux of F per unit volume of a ball centered at r as the ball shrinks to zero:

where dS is the surface element, B_r(r) is the ball with radius r and center r, and ∂B is the boundary of B, which is a sphere.

Theorem 2.2.3 (Divergence theorem)

Let Ω be a bounded smooth domain in . The volume integral of the divergence of a C¹-vector field F = (F₁, F₂, F₃) equals the total outward flux of the vector F through the boundary of Ω:

Exercise 2.2.4

Prove div F = ∂₁F₁ + ∂₂F₂ + ∂₃F₃ and the divergence theorem.

2.2.3 Curl

The circulation of a vector field F = (F₁, F₂, F₃) around a closed path C in is defined as a scalar line integral of the vector F over the path C:

If F represents an electric field intensity, the circulation will be an electromotive force around the path C.

The curl of a vector field F, denoted by curl F, is a vector whose magnitude is the maximum net circulation of F per unit area as the area shrinks to zero and whose direction is the normal direction of the area when the area is oriented to make the net circulation maximum. We can define the b-directional net circulation of F at r precisely by

where is the disk centered at r with radius r and normal to d. Then, curl F is its maximum net circulation:

Theorem 2.2.5 (Stokes's theorem)

Let C_area be an open smooth surface with its boundary as a smooth contour C. The surface integral of the curl of a C¹-vector field F over the surface C_area is equal to the closed line integral of the vector F along the contour C:

Exercise 2.2.6

Prove that the expression for curl F in Cartesian coordinates is

images

Exercise 2.2.7

Let and . Prove the following vector identities.

1. A × (B × C) = (A · C)B − (A · B)C.

2. ∇ × ∇ × U = ∇(∇ · U) − ΔU.

3. ∇ × ∇u = 0.

4. If ∇ × U = 0, then there exists such that ∇v = U in .

5. ∇ · (∇ × U) = 0.

6. ∇ · (A × B) = B · (∇ × A) − A · (∇ × B).

Exercise 2.2.8

Let . Let C be a closed curve and let C_area be a surface enclosed by C. Let Ω be a bounded domain in . Prove the following:

2.2.4 Curve

A curve in and is represented, respectively, by

and

The curve is said to be regular if r′(t) ≠ 0 for all t. The arc length s(t) of between r(t₀) to r(t₁) is given by

The unit tangent vector T of the curve is given by

The unit normal vector n of the curve is determined by

where κ is the curvature given by

Here, we use |r′′ × r′| = |(s′)²κ T × (s′n)| = |(s′)³κ|. Note that dn/ds = − κ T since T · n = 0 and T_s · n + T · n_s = 0.

2.2.5 Curvature

Consider a plane curve r(s) = (x(s), y(s)) in , where s is the length parameter. If θ(s) stands for the angle between T(s) and the x-axis, then κ(s) = dθ/ds because

When the curve is represented as r(t) = (x(t), y(t)), then

Now, consider a curve given implicitly by the level set of :

Then, the normal and tangent vectors to the level curve are

The curvature κ is

To see this, assume that ϕ(r(t)) = 0 and set y′ = − ϕ_x and x′ = ϕ_y because ∇ϕ(r) · r′(t) = 0. Then,

imply

Hence,

The curvature κ at a given point is the inverse of the radius of a disk that best fits the curve at that point.

Next, we consider a space curve in represented by

Then

The curvature κ and torsion τ are computed by

where [ ] stands for the triple scalar product. If the relation between the moving coordinate system {T, N, B} and the fixed coordinates {x, y, z} is

then

images

We consider a regular surface :

Note that . The condition corresponding to the regularity of a space curve is that

images

If |r_u × r_v| ≠ 0, the tangent vectors r_u and r_v generate a tangent plane that is spanned by r_u and r_v. The surface normal vector is expressed as

For example, if U = [0, 2π] × [0, π] and , then represents the unit sphere. If the surface is written as z = f(x, y), then

and the unit normal vector n is

images

If the surface is given by ϕ(r) = 0, then

2.3 Taylor's Expansion

The dynamics among system parameters, variables, excitations and measurements are often expressed as complicated nonlinear functions. In a nonlinear inverse problem, we may adopt a linearization approach in its solution-seeking process. Depending on the given problem, one may need to adopt a specific linearization process. In this section, we review Taylor's expansion as a tool to perform such an approximation.

Taylor polynomials are often used to approximate a complicated function. Taylor's expansion for about x is

2.2

where O(|h|^m+1) is the remainder term containing the (m + 1)th order of h. Precisely, the remainder term is

This expansion leads to numerical differential formulas of f in various ways, for example:

(forward difference);

(backward difference);

(centered difference);

The Newton–Raphson method to find a root of f(x) = 0 can be explained from the first-order Taylor's approximation f(x + h) = f(x) + hf′(x) + O(h²) ignoring the term O(h²), which is negligible when h is small. The method is based on the approximation

It starts with an initial guess x₀ and generates a sequence {x_n} by the formula

It may not converge to a solution in general. The convergence issue will be discussed later.

We turn our attention to the second derivative f′′(x). By Taylor's expansion, we approximate f′′(x) by

The sign of f′′(x) gives local information about f for a sufficiently small positive h:

(mean value property, MVP);

(sub-MVP);

(super-MVP).

We consider a multi-dimensional case . The following fourth-order Taylor's theorem in n variables will be used later.

Theorem 2.3.1 (Taylor's approximation)

For

images

where x = (x₁, x₂, …, x_n), h = (h₁, …, h_n) and

images

which is called the Hessian matrix.

This leads to

If D²f(x) is a positive definite matrix, then, for a sufficiently small r,

which leads to the sub-MVP

Similarly, the super-MVP can be derived for a negative definite matrix D²f(x).

Theorem 2.3.2

Suppose is a C³ function and ∇f(x₀) = 0.

1. If f has a local maximum (minimum) at x₀, then the Hessian matrix D²f(x₀) is negative (positive) semi-definite.

2. If D²f(x₀) is negative (positive) definite, then f has a local maximum (minimum) at x₀.

Example 2.3.3

If satisfies the Laplace equation , then, for a small h > 0, we have

Hence, f(x) can be viewed approximately as the average of neighboring points:

images

This type of mean value property will be discussed later.

2.4 Linear System of Equations

We consider a linear system of equations including system parameters, variables, excitations and measurements. We may derive it directly from a linear physical system or through a linearization of a nonlinear system. Once we express a forward problem as a linear transform of inputs or system parameters to measured outputs, we can adopt numerous linear methods to seek solutions of the associated inverse problem. For the details of linear system of equations, please refer to Strang (2005, 2007).

2.4.1 Linear System and Transform

We consider a linear system in Figure 2.4. We express its outputs y_i for i = 1, …, m as a linear system of equations:

2.3 2.2

Using two vectors , and a matrix , we can express the linear system as

2.4

images

Figure 2.4 Linear system with multiple inputs and multiple outputs

In most problems, the output vector y consists of measured data. If the input vector x includes external excitations or internal sources, the matrix is derived from the system transfer function or gain determined by the system structure and parameters. When the vector x contains system parameters, the linear system of equations in (2.4) is formulated subject to certain external excitations or internal sources, information about which is embedded in the matrix .

For the simplest case of m = n = 1, it is trivial to find x₁ = y₁/a₁₁. If m = 1 and n > 1, that is, , it is not possible in general to determine all x_j uniquely since they are summed to result in a single value y₁. This requires us to increase the number of measurements so that m ≥ n. However, for certain cases, we will have to deal with the situation of m < n. Figure 2.5 illustrates three different cases of the linear system of equations. To obtain an optimal solution x by solving (2.4) for x, it is desirable to understand the structure of the matrix in the context of vector spaces.

Figure 2.5 Three cases of linear systems of equations

2.4.2 Vector Space of Matrix

We denote by a set of linear transforms from to , that is, is the vector space consisting of m × n matrices

images

We call an m × 1 matrix a column vector and an 1 × n matrix a row vector. For the matrix A, we denote its ith row vector as

and its jth column vector as

For two vectors , we define their inner product by

and define the outer product as

images

For and , we consider a linear transform or a linear system of equations:

We can understand it as either

2.5 2.4

2.6 2.5

Figure 2.6 shows these two different representations of a linear system of equations for the case of m = n. In (2.5), each output y_i is a weighted sum of all , and the row vector, row(A;i), provides weights for y_i. In (2.6), the output vector y is expressed as a linear combination of n column vectors, with weights . It is very useful to have these two different views about the linear transform in (2.4) to better understand a solution of an inverse problem as well as the forward problem itself.

Figure 2.6 Two different representations of a linear system of equations y = Ax

We now summarize the null space, range and rank of a matrix before we describe how to solve for x.

Definition 2.4.1

We have the following:

The null space of a matrix is

The range space of a matrix is

The rank of a matrix , denoted by rank(A), is the maximum number of independent columns or rows.

From (2.5) and (2.6), we have

2.7

and

2.8

If N(A) = {0}, the following are true:

1. col(A;1), …, col(A;n) are linearly independent.

is invertible, that is, Ax = y uniquely determines x.

3. Every y ∈ R(A) can be decomposed uniquely as

4. A has a left inverse, that is, there exists

such that 〈row(B;j), col(A;k) 〉 = 0 for j ≠ k and 〈row(B;j), col(A;j) 〉 = 1.

5. det(A^TA) ≠ 0.

If , the following hold:

is linearly independent and N(A^T) = {0}.

3. A has a right inverse, that is, there exists

such that 〈row(A;j), col(B;k) 〉 = 0 for j ≠ k and 〈row(A;j), col(B;j) 〉 = 1.

4. det(AA^T) ≠ 0.

We also note that

2.4.3 Least-Squares Solution

We consider a linear system of equations y = Ax. If there are more equations than unknowns, that is, m > n, the system is over-determined and may not have any solution. On the other hand, if there are fewer equations than unknowns, that is, m < n, the system is under-determined and has infinitely many solutions. In these cases, we need to seek a best solution of y = Ax in an appropriate sense.

Definition 2.4.2

Let . Then

x_* is called the least − squares solution of y = Ax if

x^† is called the minimum−norm solution of y = Ax if x^† is a least-squares solution of y = Ax and

If x_* is the least-squares solution of y = Ax, then Ax_* is the projection of y on R(A), and the orthogonality principle yields

If A^TA is invertible, then

and the projection matrix on R(A) can be expressed as

2.9

Since ||x||² ≥ ||x_†||² for all x such that y = Ax,

that is, x_† is orthogonal to N(A).

Exercise 2.4.3

Considering Figure 2.7, explain the least-squares and minimum-norm solutions of .

Figure 2.7 Least-squares and minimum-norm solutions of y = Ax

2.4.4 Singular Value Decomposition (SVD)

Let A be an m × n matrix. Then A^TA can be decomposed into

images

where v₁, …, v_n are orthonormal eigenvectors of A^TA. Since V is an orthogonal matrix, V⁻¹ = V^T.

If we choose u_i = (1/σ_i)Av_i, then

2.10

and

2.11

Since V⁻¹ = V^T, we have the singular value decomposition (SVD) of A:

2.12

where

images

Suppose we number the u_i, v_i and σ_i so that σ₁ ≥ σ₂ ≥ ··· ≥ σ_r > 0 = σ_r+1 = ··· = σ_n. Then, we can express the singular value decomposition of A as

2.13 2.12

This has the very useful property of splitting any matrix A into rank-one pieces ordered by their sizes:

2.14

Figure 2.8 shows two graphical representations of the SVD. If σ_t+1, …, σ_r are negligibly small, we may approximate by the truncated SVD as

2.15

with t < r.

Figure 2.8 Graphical representations of the SVD,

We may interpret the linear transform as shown in Figure 2.9. First, V^Tx provides coefficients of x along the input directions of . Second, ΣV^Tx scales V^Tx by . Third, UΣV^Tx reconstructs the output y in the directions of . From the relation Av_i = σ_iu_i, we can interpret v₁ as the most sensitive input direction and u₁ as the most sensitive output direction with the largest gain of σ₁.

Figure 2.9 Interpretation of

2.4.5 Pseudo-inverse

The pseudo-inverse of A is

2.16

Here, in the pseudo-inverse Σ^† of the diagonal matrix Σ, each σ ≠ 0 is replaced by 1/σ.

Since

2.17

the products AA^† and A^†A can be viewed as projection matrices:

2.18 2.17

We also know that A^†y is in the row space of A = span{u₁, …, u_r} and Ax = y is solvable only when y is in the column space of A.

The least-squares solution x^† = A^†y minimizes

2.19

Hence, x^† satisfies the normal equation

2.20

Moreover, x^† is the smallest solution of A^TAx = A^Ty because it has no null components.

2.5 Fourier Transform

Joseph Fourier (1768–1830) introduced the Fourier series to solve the heat equation. Since then, Fourier analysis has been widely used in many branches of science and engineering. It decomposes a general function f(x) into a linear combination of basic harmonics, sines or cosines, that is easier to analyze. For details of the theory, refer to Bracewell (1999) and Gasquet and Witomski (1998). We introduce the Fourier transform as a linear transform since it is also widely used in the inverse problem area.

2.5.1 Series Expansion

Assume that a set {ϕ₀, ϕ₁, …} is an orthonormal basis of a Hilbert space H, that is:

1. {ϕ₀, ϕ₁, … } is orthonormal;

2. every f ∈ H can be expressed as f = ∑c_jϕ_j.

If we denote by P_n the projection map from H to V_n, then

images

If we denote , then . By the Pythagorean theorem, we have

and we obtain the Bessel inequality

2.21 2.20

By the Bessel inequality, the series

images

Moreover,

Hence,

This gives us the following series expansion:

images

Exercise 2.5.1

Let V_N = span{ 1, cos2πx, sin2πx, …, cos2πNx, sin2πNx}. Prove that f ∈ V_N can be expanded as the following trigonometric series:

images

where

images

Example 2.5.2

Let V_N = span{e^2πikx | k = 0, ± 1, …, ±N} be the subspace of L²([0, 1]) and let H be the closure of . Then, any complex-valued function f ∈ H can be uniquely represented by

images

To see this, note that

images

The projection map P_N from H to V_N can be expressed as

images

From the Bessel inequality, we have

images

Hence, we have the Riemann–Lebesgue lemma:

and

images

2.5.2 Fourier Transform

For each p with 1 ≤ p < ∞, we denote the class of measurable functions f on such that by :

where the L^p-norm of f is

Definition 2.5.3

We have the following:

1. If , the Fourier transform of f is defined by

2. The convolution of two functions f and g is given by

3. The Dirac delta function δ is a distribution satisfying

4. The support of f is the set

5. The function f is band − limited if is bounded. The function f is said to be band-limited with width L if

6. The Gaussian function with a mean value b and a standard deviation σ is

The Gaussian function has the property that and lim_σ→0G_{σ, b}(x) = δ(x − b).

Let us summarize the Fourier transform pairs of some important functions.

Scaling property:

images

Shifting property:

Dirac delta function:

Dirac comb:

Indicator function of the interval [a, b], denoted by χ_{[a, b]}(x), is

We have

The Fourier transform of Gaussian function is given by

images

For the above identity

, we use the Cauchy integral formula that

for any closed curve C.

The Fourier transform of f * g(x) = ∫f(x − y)g(y) dy is

images

Modulation: if g(x) = f(x)cos(xi₀x), then

Derivative: if , then integrating by parts leads to

The following Fourier inversion provides that f can be recovered from its Fourier transform .

Theorem 2.5.4 (Fourier inversion formula)

If , then

2.22

Proof.

The inverse Fourier transform of

can be expressed as

2.23 2.22

The last identity can be obtained by interchanging the order of the integration. From the identity

and the scaling property

, the last quantity in (2.23) can be expressed as

images

Then, the identity (2.23) can be simplified into

Since lim_→0ϕ(x) = 0 for all x ≠ 0 and

, we have

Indeed, the Fourier inversion formula holds for general because is dense in and . We omit the proof because it requires some time-consuming arguments of Lebesgue measure theory and some limiting process in L². For a rigorous proof, please refer to Gasquet and Witomski (1998).

The following Poisson summation formula indicates that the T-periodic summation of f is expressed as discrete samples of its Fourier transform with the sampling distance 1/T.

Theorem 2.5.5 (Poisson summation formula)

For , we have

2.24 2.23

In particular,

images

Proof.

Denoting

, we have

images

The T-periodic function f * comb_T(x) can be expressed as

images

where

This a_n is given by

images

Theorem 2.5.6

If , then

From this, we again derive the Poisson summation formula:

2.25 2.24

Proof.

Since

is 1/T-periodic, it suffices to prove

The above identity can be derived from the following facts.

3. For any

with ϕ(0) = 0,

The band-limited function f with bandwidth B, that is, , does not contain sinusoidal waves at frequencies higher than B. This means that f cannot oscillate rapidly within a distance less than 1/(2B). Hence, the band-limited f can be represented by means of its uniformly spaced discrete sample {f(nΔx):n = 0, ± 1, ± 2, …} provided that the sampling interval Δx is sufficiently small. Indeed, the following sampling theorem states that, if Δx ≤ 1/(2B), then the discrete sample {f(nΔx):n = 0, ± 1, ± 2, …} contains the complete information about f. This 2B is called the Nyquist rate.

Theorem 2.5.7 (Whittaker–Shannon sampling theorem)

Suppose and . The original data f can be reconstructed by the interpolation formula

where sinc(x) = sin(πx)/(πx).

Proof.

It is enough to prove the above sampling theorem for

. Denoting

images

the sampled version f(x) comb(x) is

images

Taking the Fourier transforms of the above identity, we obtain

images

The last equality comes from the Poisson summation formula (2.24). From the assumption

images

This means that

Since

, we get

This gives

Remark 2.5.8

When we need to analyze an analog signal, it must be converted into digital form by an analog-to-digital converter (ADC). Assume that is the analog signal to be analyzed and 1/T is the sampling interval. According to the Poisson summation formula, the T-periodic function f * comb_T is reconstructed from the samples . This result is based on the fact that the Fourier transform of is (1/T) comb_1/T(xi). Hence, if f(x) ≠ 0 for , the formula (2.25) produces aliasing or wrap-around. To avoid aliasing, we require that supp(f) ⊂ [ − T/2, T/2], that is, the spatial frequency of f should be less than T/2.

2.5.3 Discrete Fourier Transform (DFT)

Assume that f is a continuous signal such that

Choose a number N so that

With the condition N ≥ TΞ, the original signals f and are approximately recovered from the samples {f(kT/N − T/2):k = 0, 1, …, N − 1} and based on either the Poisson summation formula or the Whittaker–Shannon sampling theorem.

Let us convert the continuous signal f into the digital signal {f(kΔx − T/2):k = 0, 1, …, N − 1} with the sampling spacing Δx = T/N. The points x_k = kΔx − T/2, with k = 0, 1, …, N − 1, are called the sampling points.

Writing f_k = f(kΔx − T/2), the digital signal corresponding to the continuous signal f can be expressed in the following vector form:

images

Its Fourier transform for xi ∈ [ − Ξ/2, Ξ/2] is expressed approximately by

images

Similarly, we denote where xi_j = jΔxi − Ξ/2 and Δxi = Ξ/N. Then, f(x) for x ∈ [ − T/2, T/2] is expressed approximately by

images

In particular, we have

images

Since

we have the following approximations:

images

From the above approximations, we can define the discrete Fourier transform.

Definition 2.5.9

The discrete Fourier transform (DFT) on is a linear transform given by

images

where γ_N: = exp( − 2πi/N) is the Nth principal root of 1. Here, the DFT linear transform can be viewed as an N × N matrix.

The DFT matrix has the following interesting properties.

Let a_j−1 be the jth row (or column) of the Fourier transform matrix . Then,

The column vectors a₀, …, a_N−1 are eigenvectors of the following matrix corresponding to the Laplace operator:

images

Hence, {a₀, …, a_N−1} forms an orthogonal basis over the N-dimensional vector space and

images

where A* is the complex conjugate of the transpose of the matrix A. The inverse of the DFT linear transform

is simply its transpose:

If and are the DFTs of f and g, then we have Parseval's identity:

where the inner product is defined by

2.5.4 Fast Fourier Transform (FFT)

The fast Fourier transform (FFT) is a DFT algorithm that reduces the number of computations from something on the order of N² to NlogN. Let N = 2M and

images

From the definition of the DFT, we have

images

Since and , the (M + j, 2k)-component of the matrix is

and the (M + j, 2k − 1)-component of the matrix is

The FFT is based on the following key identity:

2.26 2.25

where is the N × N DFT matrix defined in the previous section and

images

Remark 2.5.10

Assume that f is supported in and that f = (f_−N/2, …, f_(N/2)−1) is a digital image of f(x), where f_n = f(nΔx) and Δx is the sampling interval (or pixel size). The interval is referred to as the field of view (FOV). For simplicity, assume NΔx = 1, that is, . The number N may be regarded as the number of pixels. Assume that f is band-limited with , that is, and denote . We can recover f from f without any loss of information provided that it meets the Nyquist criterion:

Hence, if Δxi = 1/(NΔx), the DFT gives

images

Extending the sequence f = (f_−N/2, …, f_(N/2)−1) to the N-periodic sequence in such a way that f_n+mN = f_n, the discrete version of the Poisson summation formula is

images

2.5.5 Two-Dimensional Fourier Transform

The one-dimensional definition of the Fourier transform can be extended to higher dimensions. The Fourier transform of a two-dimensional function ρ(x, y), denoted by S(k_x, k_y), is defined by

We can generalize all the results in the one-dimensional Fourier transform to the two-dimensional case because the two-dimensional Fourier transform can be expressed as two one-dimensional Fourier transforms along x and y variables:

images

If ρ(x, y) and S(k_x, k_y) are a Fourier transform pair, we have the following properties.

Scaling property:

Shifting property:

Modulation:

Fourier inversion formula:

Sampling theorem: if supp(S) ⊂ [ − B, B] × [ − B, B], then the original data ρ(x, y) can be reconstructed by the interpolation formula

References

Bracewell R 1999 The Fourier Transform and Its Applications, 3rd edn. McGraw-Hill, New York.

Gasquet C and Witomski P 1998 Fourier Analysis and Applications: Filtering, Numerical Computation, Wavelets. Springer, New York.

Rudin W 1970 Real and Complex Analysis. McGraw-Hill, New York.

Strang G 2005 Linear Algebra and Its Applications, 4th edn. Thomson Learning, London.

Strang G 2007 Computational Science and Engineering. Wellesley-Cambridge, Wellesley, MA.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Signal and System as Vectors

Create new playlist

Sign In

Sign Up

2.1 Vector Spaces

2.1.1 Vector Space and Subspace

2.1.2 Basis, Norm and Inner Product

2.1.3 Hilbert Space

2.2 Vector Calculus

2.2.1 Gradient

2.2.2 Divergence

2.2.3 Curl

2.2.4 Curve

2.2.5 Curvature

2.3 Taylor's Expansion

2.4 Linear System of Equations

2.4.1 Linear System and Transform

2.4.2 Vector Space of Matrix

2.4.3 Least-Squares Solution

2.4.4 Singular Value Decomposition (SVD)

2.4.5 Pseudo-inverse

2.5 Fourier Transform

2.5.1 Series Expansion

2.5.2 Fourier Transform

2.5.3 Discrete Fourier Transform (DFT)

2.5.4 Fast Fourier Transform (FFT)

2.5.5 Two-Dimensional Fourier Transform

References

Table of Contents for
Chapter 2: Signal and System as Vectors