In ℝ2R2, it is generally more convenient to use the standard basis {e1,e2}{e1,e2} than to use some other basis, such as {(2,1)T,(3,5)T}{(2,1)T,(3,5)T}. For example, it would be easier to find the coordinates of (x1,x2)T(x1,x2)T with respect to the standard basis. The elements of the standard basis are orthogonal unit vectors. In working with an inner product space V, it is generally desirable to have a basis of mutually orthogonal unit vectors. Such a basis is convenient not only in finding coordinates of vectors, but also in solving least squares problems.
Example 1
The set {(1,1,1)T,(2,1,−3)T,(4,−5,1)T}{(1,1,1)T,(2,1,−3)T,(4,−5,1)T} is an orthogonal set in ℝ3R3, since
If{v1,v2,…,vn}{v1,v2,…,vn}is an orthogonal set of nonzero vectors in an inner product space V, thenv1,v2,…,vnv1,v2,…,vnare linearly independent.
Proof
Suppose that v1,v2,…,vnv1,v2,…,vn are mutually orthogonal nonzero vectors and
c1v1+c2v2+…+cnvn=0
c1v1+c2v2+…+cnvn=0
(1)
If 1≤j≤n1≤j≤n, then, taking the inner product of vjvj with both sides of equation (1), we see that
c1〈vj,v1〉+c2〈vj,v2〉+…+cn〈vj,vn〉=0cj‖vj‖2=0
c1⟨vj,v1⟩+c2⟨vj,v2⟩+…+cn⟨vj,vn⟩=0cj∥vj∥2=0
and hence all the scalars c1,c2,…,cnc1,c2,…,cn must be 0.
∎
The set {u1,u2,…,un}{u1,u2,…,un} will be orthonormal if and only if
〈ui,uj〉=δij
⟨ui,uj⟩=δij
where
δij={1ifi=j0ifi≠j
δij={10ififi=ji≠j
Given any orthogonal set of nonzero vectors {v1,v2,…vn}{v1,v2,…vn}, it is possible to form an orthonormal set by defining
ui=(1‖vi‖)vifori=1,2,…,n
ui=(1∥vi∥)vifori=1,2,…,n
The reader may verify that {u1,u2,…,un}{u1,u2,…,un} will be an orthonormal set.
Example 2
We saw in Example 1 that if v1=(1,1,1)T,v2=(2,1,−3)Tv1=(1,1,1)T,v2=(2,1,−3)T, and v3=(4,−5,1)Tv3=(4,−5,1)T, then {v1,v2,v3}{v1,v2,v3} is an orthogonal set in ℝ3R3. To form an orthonormal set, let
The functions cos x, cos 2x, …, cos nx are already unit vectors since
〈coskx,coskx〉=1π∫π−πcos2kxdx=1for k=1,2,…,n
⟨coskx,coskx⟩=1π∫π−πcos2kxdx=1for k=1,2,…,n
To form an orthonormal set, we need only find a unit vector in the direction of 1.
‖1‖2=〈1,1〉=1π∫π−π1dx=2
∥1∥2=⟨1,1⟩=1π∫π−π1dx=2
Thus, 1/√21/2–√ is a unit vector, and hence {1/√2,cosx,cos2x,…,cosnx}{1/2–√,cosx,cos2x,…,cosnx} is an ortho-normal set of vectors.
It follows from Theorem 5.5.1 that if B={u1,u2,…,uk}B={u1,u2,…,uk} is an orthonormal set in an inner product space V, then B is a basis for the subspace S=Span(u1,u2,…,uk)S=Span(u1,u2,…,uk). We say that B is an orthonormal basis for S. It is generally much easier to work with an orthonormal basis than with an ordinary basis. In particular, it is much easier to calculate the coordinates of a given vector v with respect to an orthonormal basis. Once these coordinates have been determined, they can be used to compute ‖v‖∥v∥.
Theorem 5.5.2
Let{u1,u2,…,un}{u1,u2,…,un}be an orthonormal basis for an inner product space V. If v=nΣi=1ciuiv=Σi=1nciui, then ci=〈v,ui〉ci=⟨v,ui⟩.
Given that {1/√2,cos2x}{1/2–√,cos2x} is an orthonormal set in C[−π,π]C[−π,π] (with an inner product as in Example 3), determine the value of ∫π−πsin4xdx∫π−πsin4xdx without computing antiderivatives.
SOLUTION
Since
sin2x=1−cos2x2=1√21√2+(−12)cos2x
sin2x=1−cos2x2=12–√12–√+(−12)cos2x
it follows from Parseval’s formula that
∫π−πsin4xdx=π‖sin2x‖2=π(12+14)=3π4
∫π−πsin4xdx=π∥∥sin2x∥∥2=π(12+14)=3π4
Orthogonal Matrices
Of particular importance are n×nn×n matrices whose column vectors form an orthonormal set in ℝnRn.
Theorem 5.5.5
Ann×nn×n matrix Q is orthogonal if and only if QTQ=IQTQ=I.
Proof
It follows from the definition that an n×nn×n matrix Q is orthogonal if and only if its column vectors satisfy
qTiqj=δij
qTiqj=δij
However, qTiqjqTiqj is the (i, j) entry of the matrix QTQQTQ. Thus, Q is orthogonal if and only if QTQ=IQTQ=I.
∎
It follows from the theorem that if Q is an orthogonal matrix, then Q is invertible and Q−1=QTQ−1=QT.
Example 6
For any fixed θθ, the matrix
Q=[cosθ−sinθsinθcosθ]
Q=[cosθsinθ−sinθcosθ]
is orthogonal and
Q−1=QT=[cosθsinθ−sinθcosθ]
Q−1=QT=[cosθ−sinθsinθcosθ]
The matrix Q in Example 6 can be thought of as a linear transformation from R2 onto ℝ2R2 that has the effect of rotating each vector by an angle θθ while leaving the length of the vector unchanged (see Example 2 in Section 4.2). Similarly, Q−1Q−1 can be thought of as a rotation by the angle −θ−θ (see Figure 5.5.1).
In general, inner products are preserved under multiplication by an orthogonal matrix [i.e., 〈x,y〉=〈Qx,Qy〉⟨x,y⟩=⟨Qx,Qy⟩ ]. Indeed,
〈Qx,Qy〉=(Qy)TQx=yTQTx=yTx=〈x,y〉
⟨Qx,Qy⟩=(Qy)TQx=yTQTx=yTx=⟨x,y⟩
In particular, if x=yx=y, then ‖Qx‖2=‖x‖2∥Qx∥2=∥x∥2 and hence ‖Qx‖=‖x‖∥Qx∥=∥x∥. Multiplication by an orthogonal matrix preserves the lengths of vectors.
Properties of Orthogonal Matrices
If Q is an n×nn×n orthogonal matrix, then
the column vectors of Q form an orthonormal basis for ℝnRn
QTQ=IQTQ=I
QT=Q−1QT=Q−1
〈Qx,Qy〉=〈x,y〉⟨Qx,Qy⟩=⟨x,y⟩
‖Qx‖2=‖x‖2∥Qx∥2=∥x∥2
Permutation Matrices
A permutation matrix is a matrix formed from the identity matrix by reordering its columns. Clearly, then, permutation matrices are orthogonal matrices. If P is the permutation matrix formed by reordering the columns of I in the order (k1,…,kn)(k1,…,kn), then P=(ek1,…,ekn)P=(ek1,…,ekn). If A is an m×nm×n matrix, then
AP=(Aek1,…,Aekn)=(ak1,…,akn)
AP=(Aek1,…,Aekn)=(ak1,…,akn)
Postmultiplication of A by P reorders the columns of A in the order (k1,…,kn)(k1,…,kn). For example, if
A=[123123]andP=[010001100]
A=[112233]andP=⎡⎣⎢001100010⎤⎦⎥
then
AP=[312312]
AP=[331122]
Since P=(ek1,……,ekn)P=(ek1,……,ekn) is orthogonal, it follows that
P−1=PT[eTk1⋮eTkn]
P−1=PT⎡⎣⎢⎢eTk1⋮eTkn⎤⎦⎥⎥
The k1k1 column of PTPT will be e1e1, the k2k2 column will be e2e2, and so on. Thus, PTPT is a permutation matrix. The matrix PTPT can be formed directly from I by reordering its rows in the order (k1,k2,…,kn)(k1,k2,…,kn). In general, a permutation matrix can be formed from I by reordering either its rows or its columns.
If Q is the permutation matrix formed by reordering the rows of I in the order (k1,k2,…,kn)(k1,k2,…,kn) and B is an n×rn×r matrix, then
Thus, QB is the matrix formed by reordering the rows of B in the order (k1,k2,…,kn)(k1,k2,…,kn). For example, if
Q=[001100010]andB=[112233]
Q=⎡⎣⎢010001100⎤⎦⎥andB=⎡⎣⎢123123⎤⎦⎥
then
QB=[331122]
QB=⎡⎣⎢312312⎤⎦⎥
In general, if P is an n×nn×n permutation matrix, premultiplication of an n×rn×r matrix B by P reorders the rows of B and postmultiplication of an m×nm×n matrix A by P reorders the columns of A.
Orthonormal Sets and Least Squares
Orthogonality plays an important role in solving least squares problems. Recall that if A is an m×nm×n matrix of rank n, then the least squares problem Ax=bAx=b has a unique solution ˆxxˆ that is determined by solving the normal equations ATAx=ATbATAx=ATb. The projection p=Aˆxp=Axˆ is the vector in R(A) that is closest to b. The least squares problem is especially easy to solve in the case where the column vectors of A form an orthonormal set in ℝmRm.
Theorem 5.5.6
If the column vectors of A form an orthonormal set of vectors inℝmRm, then ATA=IATA=Iand the solution to the least squares problem is
ˆx=ATb
xˆ=ATb
Proof
The (i, j) entry of ATAATA is formed from the ith row of ATAT and the jth column of A. Thus, the (i, j) entry is actually the scalar product of the ith and jth columns of A. Since the column vectors of A are orthonormal, it follows that
ATA=(δij)=I
ATA=(δij)=I
Consequently, the normal equations simplify to
x=ATb
x=ATb
∎
What if the columns of A are not orthonormal? In the next section, we will learn a method for finding an orthonormal basis for R(A). From this method, we will obtain a factorization of A into a product QR, where Q has an orthonormal set of column vectors and R is upper triangular. With this factorization, the least squares problem is easily solved.
If we have an orthonormal basis for R(A), the projection p = Aˆxxˆ can be determined in terms of the basis elements. Indeed, this is a special case of the more general least squares problem of finding the element p in a subspace S of an inner product space V that is closest to a given element x in V. This problem is easily solved if S has an orthonormal basis. We first prove the following theorem.
Theorem 5.5.7
Let S be a subspace of an inner product space V and letx∈Vx∈V. Let{u1,u2,,…,un}{u1,u2,,…,un}be an orthonormal basis for S. If
So p−xp−x is orthogonal to all the ui′sui's. If y∈Sy∈S, then
y=n∑i=1αiui
y=∑i=1nαiui
and hence
〈p−x,y〉=〈p−x,n∑i=1αiui〉=n∑i=1αi〈p−x,ui〉=0
⟨p−x,y⟩=⟨p−x,∑i=1nαiui⟩=∑i=1nαi⟨p−x,ui⟩=0
∎
If x∈Sx∈S, the preceding result is trivial, since by Theorem 5.5.2, p−x=0p−x=0. If x∉Sx∉S, then p is the element in S closest to x.
Theorem 5.5.8
Under the hypothesis of Theorem 5.5.7, pis the element of S that is closest tox; that is,
‖y−x‖>‖p−x‖
∥y−x∥>∥p−x∥
for anyy≠py≠pin S.
Proof
If y∈Sy∈S and y≠py≠p, then
‖y−x‖2=‖(y−p)+(p−x)‖2
∥y−x∥2=∥(y−p)+(p−x)∥2
Since y−p∈Sy−p∈S, it follows from Theorem 5.5.7 and the Pythagorean law that
‖y−x‖2=‖y−p‖2+‖p−x‖2>‖p−x‖2
∥y−x∥2=∥y−p∥2+∥p−x∥2>∥p−x∥2
Therefore, ‖y−x‖>‖p−x‖∥y−x∥>∥p−x∥.
∎
The vector p defined by (3) and (4) is said to be the projection ofxonto S.
Corollary 5.5.9
Let S be a nonzero subspace ofℝmRmand letb∈ℝmb∈Rm. If{u1,u2,…,uk}{u1,u2,…,uk}is an orthonormal basis for S andU=(u1,u2,…,uk)U=(u1,u2,…,uk), then the projectionpofbonto S is given by
p=UUTb
p=UUTb
Proof
It follows from Theorem 5.5.7 that the projection p of b onto S is given by
The matrix UUTUUT in Corollary 5.5.9 is the projection matrix corresponding to the subspace S of ℝmRm. To project any vector b∈ℝmb∈Rm onto S, we need only find an orthonormal basis {u1,u2,…,uk}{u1,u2,…,uk} for S, form the matrix UUTUUT, and then multiply UUTUUT times b.
If P is a projection matrix corresponding to a subspace S of ℝmRm, then, for any b∈ℝmb∈Rm, the projection p of b onto S is unique. If Q is also a projection matrix corresponding to S, then
Qb=p=Pb
Qb=p=Pb
It then follows that
qj=Qej=Pej=Pej=pjfor j=1,…,m
qj=Qej=Pej=Pej=pjfor j=1,…,m
and hence Q=PQ=P. Thus, the projection matrix for a subspace S of ℝmRm is unique.
Example 7
Let S be the set of all vectors in ℝ3R3 of the form (x,y,0)T(x,y,0)T. Find the vector p in S that is closest to w=(5,3,4)Tw=(5,3,4)T (see Figure 5.5.3).
Let u1=(1,0,0)Tu1=(1,0,0)T and u2=(0,1,0)Tu2=(0,1,0)T. Clearly, u1u1 and u2u2 form an orthonormal basis for S. Now
c1=wTu1=5c2=wTu2=3
c1c2==wTu1=5wTu2=3
The vector p turns out to be exactly what we would expect:
p=5u1+3u2=(5,3,0)T
p=5u1+3u2=(5,3,0)T
Alternatively, p could have been calculated using the projection matrix UUTUUT.
p=UUTw=[100010000][534]=[530]
p=UUTw=⎡⎣⎢100010000⎤⎦⎥⎡⎣⎢534⎤⎦⎥=⎡⎣⎢530⎤⎦⎥
Approximation of Functions
In many applications, it is necessary to approximate a continuous function in terms of functions from some special type of approximating set. Most commonly, we approximate by a polynomial of degree n or less. We can use Theorem 5.5.8 to obtain the best least squares approximation.
Example 8
Find the best least squares approximation to exex on the interval [0, 1] by a linear function.
SOLUTION
Let S be the subspace of all linear functions in C[0, 1]. Although the functions 1 and x span S, they are not orthogonal. We seek a function of the form x−ax−a that is orthogonal to 1.
〈1,x−a〉=∫10(x−a)dx=12−a
⟨1,x−a⟩=∫10(x−a)dx=12−a
Thus, a=12a=12. Since ‖x−12‖=1/√12∥∥x−12∥∥=1/12−−√, it follows that
Trigonometric polynomials are used to approximate periodic functions. By a trigonometric polynomial of degree n, we mean a function of the form
tn(x)=a02+n∑k=1(akcoskx+bksinkx)
tn(x)=a02+∑k=1n(akcoskx+bksinkx)
We have already seen that the collection of functions
1√2,cosx,cos2x,…,cosnx
12–√,cosx,cos2x,…,cosnx
forms an orthonormal set with respect to the inner product (2). We leave it to the reader to verify that if the functions
sinx,sin2x,…,sinnx
sinx,sin2x,…,sinnx
are added to the collection, it will still be an orthonormal set. Thus, we can use Theorem 5.5.8 to find the best least squares approximation to a continuous 2π periodic function f (x) by a trigonometric polynomial of degree n or less. Note that
for k=1,2,…,nk=1,2,…,n, then these coefficients determine the best least squares approximation to f. The ak’sak’s and the bk’sbk’s turn out to be the well-known Fourier coefficients that occur in many applications involving trigonometric series approximations of functions.
Let us think of f (x) as representing the position at time x of an object moving along a line, and let tntn be the Fourier approximation of degree n to f. If we set
Thus, the motion f (x) is being represented as a sum of simple harmonic motions.
For signal-processing applications, it is useful to express the trigonometric approximation in complex form. To this end, we define complex Fourier coefficients ckck in terms of the real Fourier coefficients akak and bkbk: