Chapter 16

The Singular Value Decomposition

Matrix decomposition is a fundamental tool in linear algebra for understanding the action of a matrix, establishing its suitability to solve a problem, and for solving linear systems more efficiently and effectively. We have encountered an important decomposition already, the eigendecomposition for symmetric matrices (see Section 7.5). The topic of this chapter, the singular value decomposition (SVD), is a tool for more general, even nonsquare matrices. Figure 16.1 demonstrates one application of SVD, image compression.

Figure 16.1

Figure showing image compression: a method that uses the SVD. Far left: original image; second from left: highest compression; third from left: moderate compression; far right: method recovers original image. See Section 16.7 for details.

Image compression: a method that uses the SVD. Far left: original image; second from left: highest compression; third from left: moderate compression; far right: method recovers original image. See Section 16.7 for details.

This chapter allows us to revisit several themes from past chapters: eigenvalues and eigenvectors, the condition number, the least squares solution to an overdetermined system, and more! It provides a good review of some important ideas in linear algebra.

16.1 The Geometry of the 2 × 2 Case

Let A be a 2 × 2 matrix, nonsingular for now. Let v1 and v2 be two unit vectors that are perpendicular to each other, thus they are orthonormal. This means that V = [v1 v2] is an orthogonal matrix: V−1 = VT. In general, A will not map two orthonormal vectors v1, v2 to two orthonormal image vectors u1, u2. However, it is possible for A to map two particular vectors v1 and v2 to two orthogonal image vectors σ1u1 and σ2u2. The two vectors u1 and u2 are assumed to be of unit length, i.e., they are orthonormal.

We formalize this as

AV=UΣ(16.1)

with an orthogonal matrix U = [u1 u2] and a diagonal matrix

Σ=[σ100σ2].

If Avi = σui, then each ui is parallel to Avi, and A preserves the orthogonality of the vi.

We now conclude from (16.1) that

A=UΣVT.(16.2)

This is the singular value decomposition, SVD for short, of A. The diagonal elements of Σ are called the singular values of A. Let’s now find out how to determine U, Σ, V.

In Section 7.6, we established that symmetric positive definite matrices, such as ATA, have real and positive eigenvalues and their eigenvectors are orthogonal. Considering the SVD in (16.2), we can write

ATA=(UΣVT)T(UΣVT)=VΣTUTUΣVT=VΣTΣVT=VΛVT,(16.3)

where

Λ=[λ100λ2]=ΣTΣ=[σ1200σ22].

Equation (16.3) states the following: The symmetric positive definite matrix ATA has eigenvalues that are the diagonal entries of Λ′ and eigenvectors as columns of V, which are called the right singular vectors of A. This is the eigendecomposition of ATA.

The symmetric positive definite matrix AAT also has an eigende-composition,

AAT=(UΣVT)(UΣVT)T=UΣVTVΣTUT=UΣΣTUT=UΛUT,(16.4)

observing that ΣTΣ = ΣΣT. Equation (16.4) states that the symmetric positive definite matrix AAT has eigenvalues that are the diagonal entries of Λ′ and eigenvectors as the columns of U, and they are called the left singular vectors of A.

It should be no surprise that Λ′ appears in both (16.3) and (16.4) since a matrix and its transpose share the same (nonzero) eigenvalues.

Now we understand a bit more about the elements of the SVD in (16.2): the singular values, σi, of A are the square roots of the eigenvalues of ATA and AAT, that is

σi=λi.

The columns of V are the eigenvectors of ATA and the columns of U are the eigenvectors of AAT. Observe from (16.1) that we can compute ui=Avi/||||.

Example 16.1

Let’s start with a very simple example:

A=[3001],

a symmetric, positive definite matrix that scales in the e1-direction. Then

AAT=ATA=[9001],

and we can easily calculate the eigenvalues of ATA as λ1=9 and λ2=1. This means that σ1 =3 and σ2 = 1. The eigenvectors of AAT and ATA are identical and happen to be the columns of the identity matrix for this simple example,

U=V=[1001].

Now we can form the SVD of A, A = UΣVT:

[3001]=[1001] [3001] [1001].

This says that the action of A on a vector x, that is Ax, simply amounts to a scaling in the e1-direction, which we observed to begin with!

Notice that the eigenvalues of A are identical to the singular values. In fact, because this matrix is positive definite, the SVD is identical to the eigendecomposition.

Throughout Chapter 5 we examined the action of a 2 × 2 matrix using an illustration of the circular Phoenix mapping to an elliptical Phoenix. This action ellipse can now be described more precisely: the semi-major axis has length σ1 and the semi-minor axis has length σ2. Figure 16.2 illustrates this point for Examples 16.1 and 16.2. In that figure, you see: the semi-axes, the map of [1 0]T (thick point), and the map of [0 1]T (thin point).

Figure 16.2

Figure showing action of a map: the unit circle is mapped to the action ellipse with semi-major axis length σ1 and semi-minor axis length σ2. Left: ellipse from matrix in Example 16.1; middle: circle; right: ellipse from Example 16.2.

Action of a map: the unit circle is mapped to the action ellipse with semi-major axis length σ1 and semi-minor axis length σ2. Left: ellipse from matrix in Example 16.1; middle: circle; right: ellipse from Example 16.2.

Example 16.2

This example is a little more interesting, as the matrix is now a shear,

A=[1201].

We compute

ATA=[1225],AAT=[5221],

and we observe that these two matrices are no longer identical, but they are both symmetric. As they are 2 × 2 matrices, we can easily calculate the eigenvalues as λ1=5.82 and λ2=0.17. (Remember: the nonzero eigenvalues are the same for a matrix and its transpose.) These eigenvalues result in singular values σ1 = 2.41 and σ2 = 0.41. The eigenvectors of ATA are the orthonormal column vectors of

V=[0.380.920.920.38]

and the eigenvectors of AAT are the orthonormal column vectors of

U=[0.920.380.380.92].

Now we can form the SVD of A, A = UΣVT:

[1201]=[0.920.380.380.92] [2.41000.41] [0.380.920.920.38].

Figure 16.3 will help us break down the action of A in terms of the SVD. It is now clear that V and U are rotation or reflection matrices and Σ scales, deforming the circle into an ellipse.

Figure 16.3

Figure showing SVD breakdown: shear matrix A from Example 16.2. Clockwise from top left: Initial point set forming a circle with two reference points; VTx rotates clockwise 67.5°; ΣVTx stretches in e1 and shrinks in e2; UΣVTx rotates counterclockwise 22.5°, illustrating the action of A.

SVD breakdown: shear matrix A from Example 16.2. Clockwise from top left: Initial point set forming a circle with two reference points; VTx rotates clockwise 67.5°; ΣVTx stretches in e1 and shrinks in e2; UΣVTx rotates counterclockwise 22.5°, illustrating the action of A.

Notice that the eigenvalues of A are λ1 = λ2 = 1, making the point that, in general, the singular values are not the eigenvalues!

Now we come full circle and look at what we have solved in terms of our original question that was encapsulated by (16.1): What orthonormal vectors vi are mapped to orthogonal vectors σiui? The SVD provides a solution to this question by providing V, U, and Σ. Furthermore, note that for this nonsingular case, the columns of V form a basis for the row space of A and the columns of U form a basis for the column space of A.

It should be clear that the SVD is not limited to invertible 2 × 2 matrices, so let’s look at the SVD more generally.

16.2 The General Case

The SVD development of the previous section assumed that A was square and invertible. This had the effect that ATA had nonzero eigenvalues and well-defined eigenvectors. However, everything still works if A is neither square nor invertible. Just a few more aspects of the decomposition come into play.

For the general case, A will be a rectangular matrix with m rows and n columns mapping n to m. As a result of this freedom in the dimensions, the matrices of the SVD (16.2) must be modified. Illustrated in Figure 16.4 is each scenario, m < n, m = n, or m > n. The matrix dimensions are as follows: A is m × n, U is m × m, Σ is m × n, and VT is n × n. The matrix Λ′ in (16.3) is n × n and in (16.4) is m × m, but they still hold the same nonzero eigenvalues because the rank of a matrix cannot exceed min{m, n}.

Figure 16.4

Figure showing SVD matrix dimensions: an overview of the SVD of an m × n matrix A. Top: m > n; middle: m = n; bottom: m < n.

SVD matrix dimensions: an overview of the SVD of an m × n matrix A. Top: m > n; middle: m = n; bottom: m < n.

Again we ask, what orthonormal vectors vi are mapped by A to orthogonal vectors σiui, where the ui are orthonormal? This is encapsulated in (16.1). In the general case as well, the matrices U and V form bases, however, as we are considering rectangular and singular matrices, the rank r of A plays a part in the interpretation of the SVD. The following are the main SVD properties, for a detailed exposition, see Strang [16].

  • The matrix Σ has nonzero singular values,σ1,...,σr, and all other entries are zero.
  • The first r columns of U form an orthonormal basis for the column space of A.
  • The last mr columns of U form an orthonormal basis for the null space of AT.
  • The first r columns of V form an orthonormal basis for the row space of A.
  • The last nr columns of V form an orthonormal basis for the null space of A.

Examples will make this clearer.

Example 16.3

Let A be given by

A=[100201].

The first step is to form ATA and AAT and find their eigenvalues and (normalized) eigenvectors, which make up the columns of an orthogonal matrix.

ATA=[1005],λ1=5,λ2=1,V=[0110];AAT=[100042021],λ1=5,λ2=1,λ3=0,U=[0100.8900.440.4400.89].

The rank of A is 2, thus there are two singular values, and

Σ=[2.2300100].

The SVD of A = UΣVT:

[100201]=[0100.8900.440.4400.89] [2.2300100] [0110].

Figure 16.5 illustrates the elements of the SVD and the action of A.

Because m > n, u3 is in the null space of AT, that is ATu3 = 0.

Figure 16.5

Figure showing SVD of a 3 × 2 matrix A: see Example 16.3. Clockwise from top left: Initial point set forming a circle with one reference point; VTx reflects; Σ VTx stretches in e1; (UΣVTx rotates counterclockwise 26.5°, illustrating the action of A.

SVD of a 3 × 2 matrix A: see Example 16.3. Clockwise from top left: Initial point set forming a circle with one reference point; VTx reflects; Σ VTx stretches in e1; (UΣVTx rotates counterclockwise 26.5°, illustrating the action of A.

Example 16.4

For a matrix of different dimensions, we pick

A=[0.800.811.50.3].

The first step is to form ATA and AAT and find their eigenvalues and (normalized) eigenvectors, which are made by the columns of an orthogonal matrix.

ATA=[1.641.50.941.52.250.450.940.450.73],λ1=3.77,λ2=0.84λ3=0,V=[0.630.380.670.710.620.310.300.680.67];AAT=[1.281.041.043.34],λ1=3.77,λ2=0.84,U=[0.390.920.920.39].

The matrix A is rank 2, thus there are two singular values, and

Σ=[1.940000.920].

The SVD of A = UΣVT:

[0.800.811.50.3]=[0.390.920.920.39] [1.940000.920] [0.630.710.30.380.620.680.670.310.67].

Figure 16.6 illustrates the elements of the SVD and the action of A.

Because m < n, v3 is in the null space of A, that is, Av3 = 0.

Figure 16.6

Figure showing the SVD of a 2 × 3 matrix A: see Example 16.4. Clockwise from top left: Initial point set forming a circle with one reference point; VTx; ΣVTx; UΣVTx, illustrating the action of A.

The SVD of a 2 × 3 matrix A: see Example 16.4. Clockwise from top left: Initial point set forming a circle with one reference point; VTx; ΣVTx; UΣVTx, illustrating the action of A.

Example 16.5

Let’s look at a fairly simple example, a rank deficient matrix,

A=[100010000],

which is a projection into the [e1, e2]-plane. Because A is symmetric and idempotent, A = ATA = AAT. It is easy to see that A = U ΣVT with

[100010000]=[100010001] [100010000] [100010001].

Since the rank is two, the first two columns of U and V form an orthonormal basis for the column and row spaces of A, respectively. The e3 vector is projected to the zero vector by A, and thus this vector spans the null space of A and AT.

Generalizing the 2 × 2 case, the σi are the lengths of the semi-axes of an ellipsoid. As before, the semi-major axis is length σ1 and the length of the semi-minor axis is equal to the smallest singular value.

The SVD is an important tool for dealing with rank deficient matrices.

16.3 SVD Steps

This section is titled “Steps” rather than “Algorithm” to emphasize that the description here is simply a review of our introduction to the SVD. A robust algorithm that is efficient in terms of computing time and storage will be found in an advanced numerical methods text.

Let’s summarize the steps for finding the SVD of a matrix A.

  • Input: an m × n matrix A.
  • Output: U, V, Σ such that A = UΣVT.
  • Find the eigenvalues λ1, ..., λn of ATA.
    • Order the λi so thatλ1λ2...λn.
    • Suppose λ1, ..., λr > 0, then the rank of A is r.
  • Create an m × n diagonal matrix Σ with σi,i=λi, i = 1, ..., r.
  • Find the corresponding (normalized) eigenvectors vi of ATA.
  • Create an n × n matrix V with column vectors vi.
  • Find the (normalized) eigenvectors ui of AAT.
  • Create an m × m matrix U with column vectors ui.

You have now found the singular valued decomposition of A, which is A = UΣVT.

A note on U: instead of finding the columns as the eigenvectors of ATA, one can compute ui,i = 1, r as ui=Avi/||||. If m > n then the remaining ui are found from the null space of AT.

The only “hard” task in this is finding the λi. But there are several highly efficient algorithms for this task, taking advantage of the fact that ATA is symmetric. Many of these algorithms will return the corresponding eigenvectors as well.

As we discussed in the context of least squares in Section 13.1, forming ATA can result in an ill-posed problem because the condition number of this matrix is the square of the condition number of A. Thus, numerical methods will avoid direct computation of this matrix by employing a method such as Householder.

16.4 Singular Values and Volumes

As a practical application, we will use the SVD to compute the determinant of a matrix A. We observe that in (16.2), the matrices U and V, being orthogonal, have determinants equal to ±1. Thus,

|detA|=detΣ=σ1...σn.(16.5)

If a 2D triangle has area φ, it will have area ±σ1σ2φ after being transformed by a 2D linear map with singular values σ1,σ2. Similarly, if a 3D object has volume φ, it will have volume ±σ1σ2σ3 after being transformed by a linear map with singular values σ1, σ2, σ3.

Of course one can compute determinants without using singular values. Recall from Section 15.1, the characteristic polynomial of A,

p(λ)=det[AλI]=(λ1λ)(λ2λ)...(λnλ).

Evaluating p at λ = 0, we have

detA=λ1...λn,(16.6)

where the λi are A’s eigenvalues.

16.5 The Pseudoinverse

The inverse of a matrix, introduced in Sections 5.9 and 12.4, is mainly a theoretical tool for analyzing the solution to a linear system. Additionally, the inverse is limited to square, nonsingular matrices. What might the inverse of a more general type of matrix be? The answer is in the form of a generalized inverse, or the so-called pseudoinverse, and it is denoted as A (“A dagger”). The SVD is a very nice tool for finding the pseudoinverse, and it is suited for practical use as we shall see in Section 16.6.

Let’s start with a special matrix, an m × n diagonal matrix Σ with diagonal elements σi. The pseudoinverse, Σ, has diagonal elements σi given by

σi={1/σiif σi>0,0else.},

and its dimension is n × m. If the rank of Σ is r, then the product ΣΣ holds the r × r identity matrix, and all other elements are zero.

We can use this very simple expression for the pseudoinverse of Σ to express the pseudoinverse for a general m × n matrix A using its SVD,

A=(UΣVT)1=VΣUT,(16.7)

recalling that U and V are orthogonal matrices.

If A is square and invertible, then A = A−1. Otherwise, A still has some properties of an inverse:

AAA=A,(16.8)

AAA=A.(16.9)

Example 16.6

Let’s find the pseudoinverse of the matrix from Example 16.3,

A=[100201].

We find

Σ=[1/2.2300010],

then (16.7) results in

A=[10002/51/5]=[0110] [1/2.2300010] [00.890.4410000.440.89].

And we check that (16.8) holds:

A=[10002/51/5] [100201] [10002/51/5],

and also (16.9):

A=[100201] [10002/51/5] [100201].

Example 16.7

The matrix from Example 16.1 is square and nonsingular, therefore the pseudoinverse is equal to the inverse. Just by visual inspection:

A=[3001],A1=[1/3001],

and the pseudoinverse is

A=[1001] [1/3001] [1001]=[1/3001].

This generalization of the inverse of a matrix is often times called the Moore-Penrose generalized inverse. Least squares approximation is a primary application for this pseudoinverse.

16.6 Least Squares

The pseudoinverse allows for a concise approximate solution, specifically the least squares solution, to an overdetermined linear system. (A detailed introduction to least squares may be found in Section 12.7.)

In an overdetermined linear system

Ax=b,

A is a rectangular matrix with dimension m × n, m > n. This means that the linear system has m equations in n unknowns and it is inconsistent because it is unlikely that b lives in the subspace V defined by the columns of A. The least squares solution finds the orthogonal projection of b into V, which we will call b′. Thus the solution to Ax = b′ produces the vector closest to b that lives in V. This leads us to the normal equations

ATAx=ATb,(16.10)

whose solution minimizes

||Axb||.(16.11)

In our introduction to condition numbers in Section 13.4, we discussed that this system can be ill-posed because the condition number of ATA is the square of that of A. To avoid forming ATA, in Section 13.1 we proposed the Householder algorithm as a method to find the least squares solution.

As announced at the onset of this section, the SVD and pseudoinverse provide a numerically stable method for finding the least squares solution to an overdetermined linear system. We find an approximate solution rather easily:

x=Ab.(16.12)

But why is this the least squares solution?

Again, we want to find x to minimize (16.11). Let’s frame the linear system in terms of the SVD of A and take advantage of the orthogonality of U,

Axb=UΣVTxb=UΣVTxUUTb=U(Σyz).

This new framing of the problem exposes that

||Axb||=||Σyz||,

and leaves us with an easier diagonal least squares problem to solve. The steps involved are as follows.

  1. Compute the SVD A = UΣVT.
  2. Compute the m × 1 vector z = UTb.
  3. Compute the n × 1 vector y = Σz.

    This is the least squares solution to the m × n problem Σy = z. The least squares solution requires minimizing v = Σyz, which has the simple form:

    v=[σ1y1z1σ2y2z2σryrzrzr+1zm],

    where the rank of Σ is r.

    It is easy to see that the y that will minimize v is yi = zi/σi for i = 1, ..., r, hence y = Σz.

  4. 4. Compute the n × 1 solution vector x = Vy.

To summarize, the least squares solution is reduced to a diagonal least squares problem (Σy = z), which requires only simple matrix-vector multiplications. The calculations in reverse order include

x=Vyx=V(Σz)x=VΣ(UTb).

We have rediscovered the pseudoinverse, and we have come back to (16.12), while verifying that this is indeed the least squares solution.

Example 16.8

Let’s revisit the least squares problem that we solved using the normal equations in Example 12.13 and the Householder method in Example 13.3. The 7 × 2 overdetermined linear system, Ax = b, is

[01101201301401501601]x=[3025404030525].

The best fit line coefficients are found in four steps.

  1. Compute the SVD, A = UΣVT. (The matrix dimensions are as follows: U is 7 × 7, Σ is 7 × 2, V is 2 × 2.)

    Σ[95.42001.470000000000],Σ=[0.0100000000.6800000].

  2. Compute

    z=UTb=[54.551.13.215.69.615.210.8].

  3. Compute

    y=Σz=[0.5734.8].

  4. Compute

    x=Vy=[0.233.48],

    resulting in the same best fit line, x2 = −0.23xi + 34.8, as we found via the normal equations and the Householder method.

The normal equations (16.10) give a best approximation

x=(ATA)1ATb(16.13)

to the original problem Ax = b. This approximation was developed by considering a new right-hand side vector b′ in the subspace of A, called V. If we substitute the expression for x in (16.13) into Ax = b′, we have an expression for b′ in relation to b,

b=A(ATA)1ATb=AAb=projVb

Geometrically, we can see that AA is a projection because the goal is to project b into the subspace V. A projection must be idempotent as well, and a property of the pseudoinverse in (16.8) ensures this.

This application of the SVD gave us the opportunity to bring together several important linear algebra topics!

16.7 Application: Image Compression

Suppose the m × n matrix A has k = min(m, n) singular values σi and as before, σ1 > σ2 > ... > σk. Using the SVD, it is possible to write A as a sum of k rank one matrices:

A=σ1u1v1T+σ2u2v2T+...+σkukvkT.(16.14)

This is analogous to what we did in (7.13) for the eigendecomposition.

We can use (16.14) for image compression. An image is comprised of a grid of colored pixels.1 Figure 16.7 (left) is a very simple example; it has only 4 × 4 pixels. Each grayscale is associated with a number, thus this grid can be thought of as a matrix. The singular values for this matrix are σi = 7.1, 3.8, 1.3, 0.3. Let’s refer to the images from left to right as I0, I1, I2, I3. The original image is I0. The matrix

A1=σ1u1v1T

results in image I2 and the matrix

A2=σ1u1v1T+σ2u2v2T

results in image I2. Notice that the original image is nearly replicated incorporating only half the singular values. This is due to the fact that σ1 and σ2 are large in comparison to σ3 and σ4. Image I3 is created from A3=A2+σ3u3v3T. Image I4, which is not displayed, is identical to I0.

Figure 16.7

Figure showing image compression: a method that uses SVD. The input matrix has singular values σi = 7.1, 3.8, 1.3, 0.3. Far left: original image; from left to right: recovering the image by adding projection terms.

Image compression: a method that uses SVD. The input matrix has singular values σi = 7.1, 3.8, 1.3, 0.3. Far left: original image; from left to right: recovering the image by adding projection terms.

The change in an image by adding only those Ii corresponding to small σ1 can be hardly noticeable. Thus omitting images Ik corresponding to small σk amounts to compressing the original image; there is no severe quality loss for small σk. Furthermore, if some σi are zero, compression can clearly be achieved. This is the case in Figure 16.1, an 8 × 8 matrix with σi = 6.2, 1.7, 1.49, 0, ..., 0. The figure illustrates images corresponding to each nonzero σi. The last image is identical to the input, making it clear that the five remaining σi = 0 are unimportant to image quality.

16.8 Principal Components Analysis

Figure 16.8 illustrates a 2D scatter plot: each circle represents a coordinate pair (point) in the [e1, e2]-system. For example, we might want to determine if gross domestic product (GDP) and the poverty rate (PR) for countries in the World Trade Organization are related. We would then record [GDP PR]T as coordinate pairs. How might we reveal trends in this data set?

Figure 16.8

Figure showing scatter plot: data pairs recorded in Cartesian coordinates.

Scatter plot: data pairs recorded in Cartesian coordinates.

Let our data set be given as x1, ..., xn, a set of 2D points such that x1 + ... + xn = 0. This condition simply means that the origin is the centroid of the points. The data set in Figure 16.8 has been translated to produce Figure 16.9 (left). Let d be a unit vector. Then the projection of xi onto a line through the origin containing d is a vector with (signed) length xi · d. Let l(d) be the sum of the squares of all these lengths:

l(d)=[x1d]2+...+[xnd]2.

Figure 16.9

Figure showing PCA: Analysis of a data set. Left: given data with centroid translated to the origin. Thick lines are coincident with the eigenvectors scaled by their corresponding eigenvalue. Right: points, eigenvector lines, and quadratic form over the unit circle.

PCA: Analysis of a data set. Left: given data with centroid translated to the origin. Thick lines are coincident with the eigenvectors scaled by their corresponding eigenvalue. Right: points, eigenvector lines, and quadratic form over the unit circle.

Imagine rotating d around the origin. For each position of d, we compute the value l(d). For the longest line in the left part of Figure 16.9, the value of l(d) is large in comparison to d orthogonal to it, demonstrating that the value of l(d) indicates the variation in the point set from the line generated by d. Higher variation results in larger l(d).

Let’s arrange the data xi in a matrix

X=[x1Tx2TxnT].

Then we rewrite l(d) as

l(d)=||Xd||2=(Xd)(Xd)=dTXTXd.(16.15)

We further abbreviate C = XTX and note that C is a symmetric, positive definite 2 × 2 matrix. Hence (16.15) describes a quadratic form just as we discussed in Section 7.6.

For which d is l(d) maximal? The answer is simple: for d being the eigenvector corresponding to C’s largest eigenvalue. Similarly, l(d) is minimal for d being the eigenvector corresponding to C’s smallest eigenvalue. Recall that these eigenvectors form the major and minor axis of the action ellipse of C, and the thick lines in Figure 16.9 (left) are precisely these axes. In the right part of Figure 16.9, the quadratic form for the data set is shown along with the data and action ellipse axes. We see that the dominant eigenvector corresponds to highest variance in the data and this is reflected in the quadratic form as well. If λ1 = λ2, then there is no preferred direction in the point set and the quadratic form is spherical. We are guaranteed that the eigenvectors will be orthogonal because C is a symmetric matrix.

Looking more closely at C, we see its very simple form,

c1,1=x1,12+x2,12+...+xn,12,c1,2=c2,1=x1,1x1,2+x2,1x2,2+...+xn,1xn,2,c2,2=x1,22+x2,22+...+xn,22.

If each element of C is divided by n, then this is called the covariance matrix. This matrix is a summary of the variation in each coordinate and between coordinates. Dividing by n will result in scaled eigenvalues; the eigenvectors will not change.

The eigenvectors provide a convenient local coordinate frame for the data set. This isn’t a new idea: it is exactly the principle of the eigendecomposition. This frame is commonly called the principal axes. Now we can construct an orthogonal transformation of the data, expressing them in terms of a new, more meaningful coordinate system. Let the matrix V = [v1 v2] hold the normalized eigenvectors as column vectors, where v1 is the dominant eigenvector. The orthogonal transformation of the data X that aligns v1 with e1 and v2 with e2 is simply

X^=XV.

This results in

x^i=[xiv1xiv2].

Figure 16.10 (top) illustrates the result of this transformation. Revisit Example 7.6: this is precisely the transformation we used to align the contour ellipse to the coordinate axes. (The transformation is written a bit differently here to accommodate the point set organized as transposed vectors.)

Figure 16.10

Figure showing PCA data transformations: three possible data transformations based on PCA analysis. Top: data points transformed to the principal components coordinate system. This set appears in all images. Middle: data compression by keeping dominant eigenvector component. Bottom: data compression by keeping the nondominant eigenvector.

PCA data transformations: three possible data transformations based on PCA analysis. Top: data points transformed to the principal components coordinate system. This set appears in all images. Middle: data compression by keeping dominant eigenvector component. Bottom: data compression by keeping the nondominant eigenvector.

In summary: the data coordinates are now in terms of the trend lines, defined by the eigenvectors of the covariance matrix, and the coordinates directly measure the distance from each trend line. The greatest variance corresponds to the first coordinate in this principal components coordinate system. This leads us to the name of this method: principal components analysis (PCA).

So far, PCA has worked with all components of the given data. However, it can also be used for data compression by reducing dimensionality. Instead of constructing V to hold all eigenvectors, we may use only the most significant, so suppose V = v1. This transformation produces the middle image shown in Figure 16.10. If instead, V = v2, then the result is the bottom image, and for clarity, we chose to display these points on the e2-axis, but this is arbitrary. Comparing these results: there is greater spread of the data in the middle image, which corresponds to a trend line with higher variance.

Here we focused on 2D data, but the real power of PCA comes with higher dimensional data for which it is very difficult to visualize and understand relationships between dimensions. PCA makes it possible to identify insignificant dimensions and eliminate them.

image

  • singular value decomposition (SVD)
  • singular values
  • right singular vector
  • left singular vector
  • SVD matrix dimensions
  • SVD column, row, and null spaces
  • SVD steps
  • volume in terms of singular values
  • eigendecomposition
  • matrix decomposition
  • action ellipse axes length
  • pseudoinverse
  • generalized inverse
  • least squares solution via the pseudoinverse
  • quadratic form
  • contour ellipse
  • principal components analysis (PCA)
  • covariance matrix

16.9 Exercises

  1. Find the SVD for

    A=[1004].

  2. What is the eigendecomposition of matrix A in Exercise 1.

  3. For what type of matrix are the eigenvalues the same as the singular values?

  4. The action of a 2 × 2 linear map can be described by the mapping of the unit circle to an ellipse. Figure 4.3 illustrates such an ellipse. What are the lengths of the semi-axes? What are the singular values of the corresponding matrix,

    A=[1/2002]?

  5. Find the SVD for the matrix

    A=[021000].

  6. Let

    A=[101010112],

    and C = ATA. Is one of the eigenvalues of C negative?

  7. For the matrix

    A=[200010001],

    show that both (16.5) and (16.6) yield the same result for the absolute value of the determinant of A.

  8. For the matrix

    A=[2020.5],

    show that both (16.5) and (16.6) yield the same result for the determinant of A.

  9. For the matrix

    A=[101010010],

    show that both (16.5) and (16.6) yield the same result for the absolute value of the determinant of A.

  10. What is the pseudoinverse of the matrix from Exercise 5?

  11. What is the pseudoinverse of the matrix

    [013000]?

  12. What is the pseudoinverse of the matrix

    A=[200]?

    Note that this matrix is actually a vector.

  13. What is the pseudoinverse of

    A=[300020001]?

  14. What is the least squares solution to the linear system Av = b given by:

    [100201] [v1v2]=[010].

    Use the pseudoinverse and the enumerated steps in Section 16.6. The SVD of A may be found in Example 16.3.

  15. What is the least squares solution to the linear system Av = b given by:

    [400100] [v1v2]=[111].

    Use the pseudoinverse and the enumerated steps in Section 16.6.

  16. What is the least squares solution to the linear system Av = b given by:

    [300012010] [v1v2v3]=[331].

    Use the pseudoinverse and the enumerated steps in Section 16.6.

  17. For the following data set X, apply PCA using all eigenvectors. Give the covariance matrix and the final components X^ in the principal components coordinate system.

    X=[22110011221010]

    Make a sketch of the data; this will help with finding the solution.

  18. For the data set in Exercise 17, apply PCA using the dominant eigenvector only.

1We use grayscales here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.62.105