Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.5 The Singular Value Decomposition

In many applications, it is necessary either to determine the rank of a matrix or to determine whether the matrix is deficient in rank. Theoretically, we can use Gaussian elimination to reduce the matrix to row echelon form and then count the number of nonzero rows. However, this approach is not practical in finite-precision arithmetic. If A is rank deficient and U is the computed echelon form, then, because of rounding errors in the elimination process, it is unlikely that U will have the proper number of nonzero rows. In practice, the coefficient matrix A usually involves some error. This may be due to errors in the data or to the finite number system. Thus, it is generally more practical to ask whether A is “close” to a rank-deficient matrix. However, it may well turn out that A is close to being rank deficient and the computed row echelon form U is not.

In this section, we assume throughout that A is an m×n $m \times n$ matrix with m≥n $m \geq n$ . (This assumption is made for convenience only; all the results will also hold if m<n $m < n$ .) We will present a method for determining how close A is to a matrix of smaller rank. The method involves factoring A into a product UΣVT $U Σ V^{T}$ , where U is an m×m $m \times m$ orthogonal matrix, V is an n×n $n \times n$ orthogonal matrix, and Σ $Σ$ is an m×n $m \times n$ matrix whose off-diagonal entries are all 0’s and whose diagonal elements satisfy

σ 1 \geq σ 2 \geq \dots \geq σ n \geq 0 \sum = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ σ 1 σ 2 ⋱ σ n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$\begin{matrix} σ_{1} \geq σ_{2} \geq … \geq σ_{n} \geq 0 \\ \sum = [\begin{matrix} \begin{matrix} σ_{1} \\ σ_{2} \\ ⋱ \\ σ_{n} \end{matrix} \end{matrix}] \end{matrix}$

The σi $σ_{i}$ ’s determined by this factorization are unique and are called the singular values of A. The factorization U∑VT $U \sum V^{T}$ is called the singular value decomposition of A, or, for short, the svd of A. We will show that the rank of A equals the number of nonzero singular values, and that the magnitudes of the nonzero singular values provide a measure of how close A is to a matrix of lower rank.

We begin by showing that such a decomposition is always possible.

Theorem 6.5.1 The SVD Theorem

If A is an m×n $m \times n$ matrix, then A has a singular value decomposition.

Proof

ATA $A^{T} A$ is a symmetric n×n $n \times n$ matrix. Therefore, its eigenvalues are all real and it has an orthogonal diagonalizing matrix V. Furthermore, its eigenvalues must all be nonnegative. To see this, let λ $λ$ be an eigenvalue of ATA $A^{T} A$ and x be an eigenvector belonging to λ $λ$ . It follows that

∥ ∥ A x ∥ 2 = x T A T A x = λ x T x = λ ∥ x ∥ 2

$‖ A x ‖^{2} = x^{T} A^{T} A x = λ x^{T} x = λ ‖ x ‖^{2}$

Hence,

λ = ∥ A x ∥ 2 ∥ x ∥ 2 \geq 0

$λ = \frac{‖ A x ‖^{2}}{{‖ x ‖}^{2}} \geq 0$

We may assume that the columns of V have been ordered so that the corresponding eigenvalues satisfy

λ 1 \geq λ 2 \geq \dots \geq λ n \geq 0

$λ_{1} \geq λ_{2} \geq … \geq λ_{n} \geq 0$

The singular values of A are given by

σ j = λ j - - \sqrt j = 1, \dots, n

$\begin{matrix} σ_{j} = \sqrt{λ_{j}} & j = 1, …, n \end{matrix}$

Let r denote the rank of A. The matrix ATA $A^{T} A$ will also have rank r. Since ATA $A^{T} A$ is symmetric, its rank equals the number of nonzero eigenvalues. Thus,

λ 1 \geq λ 2 \geq \dots \geq λ r > 0 and λ r + 1 = λ r + 2 = \cdot \cdot \cdot = λ n = 0

$\begin{matrix} λ_{1} \geq λ_{2} \geq … \geq λ_{r} > 0 & and & λ_{r + 1} = \end{matrix} λ_{r + 2} = \cdot \cdot \cdot = λ_{n} = 0$

The same relation holds for the singular values

σ 1 \geq σ 2 \geq \dots \geq σ r > 0 and σ r + 1 = σ r + 2 = \cdot \cdot \cdot = σ n = 0

$\begin{matrix} σ_{1} \geq σ_{2} \geq … \geq σ_{r} > 0 & and & σ_{r + 1} = \end{matrix} σ_{r + 2} = \cdot \cdot \cdot = σ_{n} = 0$

Now let

V 1 = (v 1, \dots, v r), V 2 = (v r + 1, \dots, v n)

$\begin{matrix} V_{1} = (v_{1}, …, v_{r}), & V_{2} = (v_{r + 1}, …, v_{n}) \end{matrix}$

and

\sum 1 = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ σ 1 σ 2 ⋱ σ r ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$\begin{matrix} \sum_{1} = [\begin{matrix} {\begin{matrix} σ \end{matrix}}_{1} \\ {\begin{matrix} σ \end{matrix}}_{2} \\ ⋱ \\ {\begin{matrix} σ \end{matrix}}_{r} \end{matrix}] \end{matrix}$ (1)

Hence, ∑1 $\sum_{1}$ is an r×r $r \times r$ diagonal matrix whose diagonal entries are the nonzero singular values σ1,…,σr $σ_{1}, …, σ_{r}$ . The m×n $m \times n$ matrix ∑ $\sum$ is then given by

\sum = [\sum 1 O O O]

$\sum = [\begin{matrix} \sum_{1} & O \\ O & O \end{matrix}]$

The column vectors of V2 $V_{2}$ are eigenvectors of ATA $A^{T} A$ belonging to λ=0 $λ = 0$ . Thus,

A T A v j = 0 j = r + 1, \dots, n

$\begin{matrix} A^{T} A v_{j} = 0 & j = r + 1, …, n \end{matrix}$

and, consequently, the column vectors of V2 $V_{2}$ form an orthonormal basis for N(ATA)=N(A) $N (A^{T} A) = N (A)$ . Therefore,

A V 2 = O

$A V_{2} = O$

and since V is an orthogonal matrix, it follows that

I A = V V T = V 1 V T 1 + V 2 V T 2 = A I = A V 1 V T 1 + A V 2 V T 2 = A V 1 V T 1

$\begin{matrix} I & = V V^{T} = V_{1} V_{1}^{T} + V_{2} V_{2}^{T} \\ A & = A I = A V_{1} V_{1}^{T} + A V_{2} V_{2}^{T} = A V_{1} V_{1}^{T} \end{matrix}$ (2)

So far we have shown how to construct the matrices V and Σ of the singular value decomposition. To complete the proof, we must show how to construct an m×m $m \times m$ orthogonal matrix U such that

A = U Σ V T

$A = U Σ V^{T}$

or, equivalently,

A V = U Σ

$A V = U Σ$ (3)

Comparing the first r columns of each side of (3), we see that

A v j = σ j u j j = 1, \dots, r

$\begin{matrix} A v_{j} = σ_{j} u_{j} & j = 1, …, r \end{matrix}$

Thus, if we define

u j = 1 σ j A v j j = 1, \dots, r

$\begin{matrix} u_{j} = \frac{1}{σ_{j}} A v_{j} & j = 1, …, r \end{matrix}$ (4)

and

U 1 = (u 1, \dots, u r)

$U_{1} = (u_{1}, …, u_{r})$

then it follows that

A V 1 = U 1 Σ 1

$A V_{1} = U_{1} Σ_{1}$ (5)

The column vectors of U1 $U_{1}$ form an orthonormal set since

u T i u j = (1 σ i v T i A T) (1 σ i A v j) = 1 σ i σ j v T i (A T A v j) = σ i σ i v T i v j = δ i j 1 \leq i \leq r, 1 \leq j \leq r

$\begin{matrix} u_{i}^{T} u_{j} & \begin{matrix} = (\frac{1}{σ_{i}} v_{i}^{T} A^{T}) (\frac{1}{σ_{i}} A v_{j}) \end{matrix} & \begin{matrix} 1 \leq i \leq r, \end{matrix} & 1 \leq j \leq r \\ = \frac{1}{σ_{i} σ_{j}} v_{i}^{T} (A^{T} A v_{j}) \\ = \frac{σ_{i}}{σ_{i}} v_{i}^{T} v_{j} \\ = δ_{i j} \end{matrix}$

It follows from (4) that each uj,1≤j≤r $u_{j}, 1 \leq j \leq r$ , is in the column space of A. The dimension of the column space is r, so u1,…,ur $u_{1}, …, u_{r}$ form an orthonormal basis for R(A) $R (A)$ . The vector space R(A)⊥=N(AT) ${R (A)}^{⊥} = N (A^{T})$ has dimension m−r $m - r$ . Let {ur+1,ur+2,…,um} ${u_{r + 1}, u_{r + 2}, …, u_{m}}$ be an orthonormal basis for N(AT) $N (A^{T})$ and set

U 2 = (u r + 1, u r + 2, \dots, u m) U = [U 1 U 2]

$\begin{matrix} U_{2} = (u_{r + 1}, u_{r + 2}, …, u_{m}) \\ U = [\begin{matrix} U_{1} & U_{2} \end{matrix}] \end{matrix}$

It follows from Theorem 5.2.2 that u1,…,um $u_{1}, …, u_{m}$ form an orthonormal basis for Rm $ℝ^{m}$ . Hence, U is an orthogonal matrix. We still must show that UΣVT $U Σ V^{T}$ actually equals A. This follows from (5) and (2) since

U Σ V T = [U 1 U 2] [Σ 1 O O O] [V T 1 V T 2] = U 1 Σ 1 V T 1 = A V 1 V T 1 = A

$\begin{matrix} U Σ V^{T} & = [\begin{matrix} U_{1} & U_{2} \end{matrix}] [\begin{matrix} Σ_{1} & O \\ O & O \end{matrix}] [\begin{matrix} V_{1}^{T} \\ V_{2}^{T} \end{matrix}] \\ = U_{1} Σ_{1} V_{1}^{T} \\ = A V_{1} V_{1}^{T} \\ = A \end{matrix}$

∎

Observations

Let A be an m×n $m \times n$ matrix with a singular value decomposition UΣVT $\begin{matrix} U Σ V^{T} \end{matrix}$ .

The singular values σ1,…,σn $σ_{1}, \dots, σ_{n}$ of A are unique; however, the matrices U and V are not unique.
Since V diagonalizes ATA $A^{T} A$ , it follows that the vj $v_{j}$ ’s are eigenvectors of ATA $A^{T} A$ .
Since AAT=UΣΣTUT $A A^{T} = U Σ Σ^{T} U^{T}$ , it follows that U diagonalizes AAT $A A^{T}$ and that the uj $u_{j}$ ’s are eigenvectors of AAT $A A^{T}$ .
Comparing the jth columns of each side of the equation

$A V = U Σ$ $A V = U Σ$

we get

$A v j = σ j u j j = 1, \dots, n$ $\begin{matrix} A v_{j} = σ_{j} u_{j} & j = 1, \dots, n \end{matrix}$

Similarly,

$A T U = V Σ T$ $A^{T} U = V Σ^{T}$

and hence

$A T u j = σ j v j A T u j = 0 for j = 1, \dots, n for j = n + 1, \dots, m$ $\begin{matrix} A^{T} u_{j} = σ_{j} v_{j} & for j = 1, \dots, n \\ A^{T} u_{j} = 0 & for j = n + 1, …, m \end{matrix}$

The vj $v_{j}$ ’s are called the right singular vectors of A, and the uj $u_{j}$ ’s are called the left singular vectors of A.
If A has rank r, then
1. v1,…,vr $v_{1}, \dots, v_{r}$ form an orthonormal basis for R(AT) $R (A^{T})$ .
2. vr+1,…,vn $v_{r + 1}, \dots, v_{n}$ form an orthonormal basis for N(A) $N (A)$ .
3. u1,…,ur $u_{1}, \dots, u_{r}$ form an orthonormal basis for R(A) $R (A)$ .
4. ur+1,…,um $u_{r + 1}, \dots, u_{m}$ form an orthonormal basis for N(AT) $N (A^{T})$ .
The rank of the matrix A is equal to the number of its nonzero singular values (where singular values are counted according to multiplicity). The reader should be careful not to make a similar assumption about eigenvalues. The matrix

$M = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ 0000100001000010 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥$ $M = [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{matrix}]$

for example, has rank 3 even though all of its eigenvalues are 0.
In the case that A has rank r<n $r < n$ , if we set

$U 1 = (u 1, u 2, \dots, u r) V 1 = (v 1, v 2, \dots, v r)$ $\begin{matrix} U_{1} = (u_{1}, u_{2}, \dots, u_{r}) & V_{1} = (v_{1}, v_{2}, \dots, v_{r}) \end{matrix}$

and define Σ1 $Σ_{1}$ as in equation (1), then

$A = U 1 Σ 1 V T 1$ $A = U_{1} Σ_{1} V_{1}^{T}$ (6)

The factorization (6) is called the compact form of the singular value decomposition of A. This form is useful in many applications.

Example 1

Let

A = ⎡ ⎣ ⎢ 110110 ⎤ ⎦ ⎥

$A = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 0 & 0 \end{matrix}]$

Compute the singular values and the singular value decomposition of A.

SOLUTION

The matrix

A T A = [2222]

$A^{T} A = [\begin{matrix} 2 & 2 \\ 2 & 2 \end{matrix}]$

has eigenvalues λ1=4 $λ_{1} = 4$ and λ2=0 $λ_{2} = 0$ . Consequently, the singular values of A are σ1=4–√=2 $σ_{1} = \sqrt{4} = 2$ and σ2=0 $σ_{2} = 0$ . The eigenvalue λ1 $λ_{1}$ has eigenvectors of the form α(1,1)T $α (1, 1)^{T}$ , and λ2 $λ_{2}$ has eigenvectors β(1,−1)T $β (1, - 1)^{T}$ . Therefore, the orthogonal matrix

v = 1 2 - \sqrt [11 1 - 1]

$v = \frac{1}{\sqrt{2}} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]$

diagonalizes ATA $A^{T} A$ . From observation 4, it follows that

u 1 = 1 σ 1 A v 1 = 1 2 ⎡ ⎣ ⎢ 110110 ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎢ 1 2 - \sqrt 1 2 - \sqrt ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 1 2 - \sqrt 1 2 - \sqrt 0 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$u_{1} = \frac{1}{σ_{1}} A v_{1} = \frac{1}{2} [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 0 & 0 \end{matrix}] [\begin{matrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{matrix}] = [\begin{matrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0 \end{matrix}]$

The remaining column vectors of U must form an orthonormal basis for N(AT) $N (A^{T})$ . We can compute a basis {x2,x3} ${x_{2}, x_{3}}$ for N(AT) $N (A^{T})$ in the usual way.

x 2 = (1, - 1, 0) T and x 3 = (0, 0, 1) T

$\begin{matrix} x_{2} = (1, - 1, 0)^{T} & and & x_{3} = (0, 0, 1)^{T} \end{matrix}$

Since these vectors are already orthogonal, it is not necessary to use the Gram–Schmidt process to obtain an orthonormal basis. We need only set

u 2 = 1 ∥ x 2 ∥ x 2 = (1 2 - \sqrt, - 1 2 - \sqrt, 0) T u 3 = x 3 = (0, 0, 1) T

$\begin{matrix} u_{2} = \frac{1}{‖ x_{2} ‖} x_{2} = (\frac{1}{\sqrt{2}}, - \frac{1}{\sqrt{2}}, 0)^{T} \\ u_{3} = x_{3} = (0, 0, 1)^{T} \end{matrix}$

It then follows that

A = U Σ V T = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 1 2 - \sqrt 1 2 - \sqrt 0 1 2 - \sqrt - 1 2 - \sqrt 0 001 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ 200000 ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎢ 1 2 - \sqrt 1 2 - \sqrt 1 2 - \sqrt - 1 2 - \sqrt ⎤ ⎦ ⎥ ⎥

$A = U Σ V^{T} = [\begin{matrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} 2 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix}] [\begin{matrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \end{matrix}]$

∎

Visualizing the SVD

If we view an m×n $m \times n$ matrix A with rank r as a mapping from the row space of A to the column space of A, then in light of observations (4) and (5) made earlier, it seems natural to choose v1,v2,…,vr $v_{1}, v_{2}, \dots, v_{r}$ as an orthonormal basis for the row space, since the image vectors

A v 1 = σ 1 u 1, A v 2 = σ 2 u 2, \dots, A v r = σ r u r

$A v_{1} = σ_{1} u_{1}, A v_{2} = σ_{2} u_{2}, \dots, {A v_{r} = σ}_{r} u_{r}$

are mutually orthogonal and the corresponding unit vectors u1,u2,…,ur $u_{1}, u_{2}, \dots, u_{r}$ will form an orthonormal basis for the column space of A. In the case of a 2×2 $2 \times 2$ matrix, the following example illustrates geometrically how one could search for the right singular vectors by moving around the unit circle.

Example 2

Let

A = [0.4 0.9 - 0.3 1.2]

$A = [\begin{matrix} 0.4 & - 0.3 \\ 0.9 & 1.2 \end{matrix}]$

To find a pair of right singular vectors of A, we must find a pair of orthonormal vectors x and y for which the image vectors Ax and Ay are orthogonal. Choosing the standard basis vectors for R2 $ℝ^{2}$ does not work, for if x=e1 $x = e_{1}$ and y=e2 $y = e_{2}$ , then the image vectors

A e 1 = a 1 = [0.4 0.9] and A e 2 = a 2 = [- 0.3 1.2]

$\begin{matrix} A e_{1} = a_{1} = [\begin{matrix} 0.4 \\ 0.9 \end{matrix}] & and & A e_{2} = a_{2} = [\begin{matrix} - 0.3 \\ 1.2 \end{matrix}] \end{matrix}$

are not orthogonal. See Figure 6.5.1.

Four vectors, A y, y, A x, and x, that originate from the origin. The x axis and the y axis are in the increments of 0.5 from negative 2 to 2.

Figure 6.5.1. Full Alternative Text

One way to search for the right singular vectors is to simultaneously rotate this initial pair of vectors around the unit circle and for each rotated pair

x 1 = [cos t sin t], y = [- sin t cos t]

$\begin{matrix} x_{1} = [\begin{matrix} cos t \\ \sin t \end{matrix}], & y = [\begin{matrix} - \sin t \\ cos t \end{matrix}] \end{matrix}$

check to see if Ax and Ay are orthogonal. For the given matrix A, this will happen when the tip of our initial x vector gets rotated to the point (0.6,0.8). It follows that the right singular vectors are

v 1 = [0.6 0.8], v 2 = [- 0.8 0.6]

$\begin{matrix} v_{1} = [\begin{matrix} 0.6 \\ 0.8 \end{matrix}], & v_{2} = [\begin{matrix} - 0.8 \\ 0.6 \end{matrix}] \end{matrix}$

Since

A v 1 = [0 1.5] = 1.5 e 2, and A v 2 = [- 0.5 0] = - 0.5 e 1

$\begin{matrix} A v_{1} = [\begin{array}{r} 0 \\ 1.5 \end{array}] = 1.5 e_{2}, & \begin{matrix} \begin{matrix} and \end{matrix} & A v_{2} = [\begin{array}{r} - 0.5 \\ 0 \end{array}] = - 0.5 e_{1} \end{matrix} \end{matrix}$

it follows that the singular values are σ1=1.5 $σ_{1} = 1.5$ and σ2=0.5 $σ_{2} = 0.5$ , and the left singular vectors are u1=e2 $u_{1} = e_{2}$ and u2=−e1 $u_{2} = {- e}_{1}$ . See Figure 6.5.2.

∎

Four vectors, A v sub 1, v sub 1, A v sub 2, and v sub 2, that originate from the origin. The x axis and the y axis are in the increments of 0.5 from negative 2 to 2.

Figure 6.5.2. Full Alternative Text

Numerical Rank and Lower Rank Approximations

If A is an m×n $m \times n$ matrix of rank r and 0<k<r $0 < k < r$ , we can use the singular value decomposition to find a matrix in Rm×n $ℝ^{m \times n}$ of rank k that is closest to A with respect to the Frobenius norm. Let ? be the set of all m×n $m \times n$ matrices of rank k or less. It can be shown that there is a matrix X in ? such that

∥ A - X ∥ F = min S \in ? ∥ A - S ∥ F

${‖ A - X ‖}_{F} = min_{S \in ?} ‖ A - {S ‖}_{F}$ (7)

We will not prove this result, since the proof is beyond the scope of this text. Assuming that the minimum is achieved, we will show how such a matrix X can be derived from the singular value decomposition of A. The following lemma will be useful.

Lemma 6.5.2

If A is an m×n $m \times n$ matrix and Q is an m×m $m \times m$ orthogonal matrix, then

∥ Q A ∥ F = ∥ A ∥ F

${‖ Q A ‖}_{F} = ‖ A ‖_{F}$

Proof

∥ Q A ∥ 2 F = ∥ (Q a 1, Q a 2, \dots, Q a n) ∥ 2 F = \sum i = 1 n ∥ ∥ Q a i ∥ 22 = \sum i = 1 n ∥ ∥ ∥ a i ∥ 22 = ∥ A ∥ 2 F

$\begin{matrix} {‖ Q A ‖}_{F}^{2} & = ‖ (Q a_{1}, Q a_{2}, \dots, Q a_{n}) ‖_{F}^{2} \\ = \sum_{i = 1}^{n} ‖ Q {a_{i} ‖}_{2}^{2} \\ = \sum_{i = 1}^{n} ‖ {a_{i} ‖}_{2}^{2} \\ {\begin{matrix} = ‖ A ‖ \end{matrix}}_{F}^{2} \end{matrix}$

∎

If A has singular value decomposition UΣVT $U Σ V^{T}$ , then it follows from the lemma that

∥ ∥ A ∥ ∥ F = ∥ ∥ Σ V T ∥ ∥ F

$‖ A ‖_{F} = ‖ Σ V^{T} ‖_{F}$

Since

∥ ∥ Σ V T ∥ ∥ F = ∥ ∥ (Σ V T) T ∥ ∥ F = ∥ ∥ V Σ T ∥ ∥ F = ∥ ∥ Σ T ∥ ∥ F

$‖ Σ V^{T} ‖_{F} = ‖ (Σ V^{T})^{T} ‖_{F} = ‖ {V Σ}^{T} ‖_{F} = ‖ Σ^{T} ‖_{F}$

It follows that

∥ A ∥ F = (σ 21 + σ 22 + \dots + σ 2 n) 1 / 2

$‖ A ‖_{F} = (σ_{1}^{2} + σ_{2}^{2} + \dots + σ_{n}^{2})^{1 / 2}$

Theorem 6.5.3

Let A=UΣVT $A = U Σ V^{T}$ be an m×n $m \times n$ matrix, and let ? denote the set of all m×n $m \times n$ matrices of rank k or less, where 0<k<rank (A) $0 < k < rank (A)$ . If X is a matrix in ? satisfying (7), then

∥ ∥ A - X ∥ ∥ F = (σ 2 k + 1 + σ 2 k + 2 + \dots + σ 2 n) 1 / 2

$‖ A - X ‖_{F} = (σ_{k + 1}^{2} + σ_{k + 2}^{2} + \dots + σ_{n}^{2})^{1 / 2}$

In particular, if A′=UΣ′VT $A^{'} = U Σ^{'} V^{T}$ , where

then

∥ A - A' ∥ F = (σ 2 k + 1 + \dots + σ 2 n) 1 / 2 = min S \in M ∥ A - S ∥ F

$‖ A - A^{'} ‖_{F} = (σ_{k + 1}^{2} + \dots + σ_{n}^{2})^{1 / 2} = min_{S \in M} ‖ A - S ‖_{F}$

Proof

Let X be a matrix in M satisfying (7). Since $A^{'} \in ?$ , it follows that

$‖ A - X ‖_{F} \leq {‖ A - A^{'} ‖}_{F} = (σ_{k + 1}^{2} + \dots + σ_{n}^{2})^{1 / 2}$ (8)

We will show that

$‖ A - X ‖_{F} \geq = (σ_{k + 1}^{2} + \dots + σ_{n}^{2})^{1 / 2}$

and hence that equality holds in (8). Let $Q Ω P^{T}$ be the singular value decomposition of X, where

If we set $B = Q^{T} A P$ , then $A = {Q B P}^{T}$ , and it follows that

$‖ A - X ‖_{F} = ‖ Q (B - Ω) P^{T} ‖_{F} = ‖ B - Ω ‖_{F}$

Let us partition B in the same manner as Ω.

It follows that

$‖ A - X ‖_{F}^{2} = ‖ B_{11} - Ω_{k} ‖_{F}^{2} + ‖ B_{12} ‖_{F}^{2} + ‖ B_{21} ‖_{F}^{2} + ‖ B_{22} ‖_{F}^{2}$

We claim that $B_{12} = O$ . If not, then define

$Y = Q [\begin{matrix} B_{11} & B_{12} \\ O & O \end{matrix}] P^{T}$

The matrix Y is in ? and

$‖ A - Y ‖_{F}^{2} = {‖ B_{21} ‖}_{F}^{2} + {‖ B_{22} ‖}_{F}^{2} < ‖ A - X ‖_{F}^{2}$

But this contradicts the definition of X. Therefore, $B_{12} = O$ . In a similar manner, it can be shown that $B_{21}$ must equal O. If we set

$Z = Q [\begin{matrix} B_{11} & O \\ O & O \end{matrix}] P^{T}$

then $Z \in M$ and

$‖ A - Z ‖_{F}^{2} = {‖ B_{22} ‖}_{F}^{2} \leq {‖ B_{11} - Ω_{k} ‖}_{F}^{2} + {‖ B_{22} ‖}_{F}^{2} ‖ A - X ‖_{F}^{2}$

It follows from the definition of X that $B_{11}$ must equal $Ω_{K}$ . If $B_{22}$ has singular value decomposition $U_{1} Λ V_{1}^{T}$ , then

$‖ A - X ‖_{F} = {‖ B_{22} ‖}_{F} = ‖ Λ ‖_{F}$

Let

$\begin{matrix} U_{2} = [\begin{matrix} I_{k} & O \\ O & U_{1} \end{matrix}] & and & V_{2} = [\begin{matrix} I_{k} & O \\ O & V_{1} \end{matrix}] \end{matrix}$

Now,

$\begin{matrix} U_{2}^{T} Q^{T} A P V_{2} & = [\begin{matrix} Ω_{k} & O \\ O & Λ \end{matrix}] \\ A & = (Q U_{2}) [\begin{matrix} Ω_{k} & O \\ O & Λ \end{matrix}] (P V_{2})^{T} \end{matrix}$

and hence it follows that the diagonal elements of Λ are singular values of A. Thus,

$‖ A - X ‖_{F} = ‖ Λ ‖_{F} \geq (σ_{k + 1}^{2} + \dots + σ_{n}^{2})^{1 / 2}$

It then follows from (8) that

$‖ A - X ‖_{F} = (σ_{k + 1}^{2} + \dots + σ_{n}^{2})^{1 / 2} = ‖ A - A^{'} ‖_{F}$

∎

If A has singular value decomposition $U Σ V^{T}$ , then we can think of A as the product of $U Σ$ times $V^{T}$ . If we partition $U Σ$ into columns and $V^{T}$ into rows, then

$U Σ = (σ_{1} u_{1}, σ_{2} u_{2}, \dots, σ u_{n})$

and we can represent A by an outer product expansion

$A = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + \dots + σ_{n} u_{n} v_{n}^{T}$ (9)

If A is of rank n, then

$\begin{matrix} A^{'} & = U [\begin{matrix} σ_{1} \\ σ_{2} \\ ⋱ \\ σ_{n - 1} \\ 0 \end{matrix}] V^{T} \\ = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + \dots + σ_{n - 1} u_{n - 1} v_{n - 1}^{T} \end{matrix}$

will be the matrix of rank $n - 1$ that is closest to A with respect to the Frobenius norm. Similarly,

$A^{″} = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} \dots + σ_{n - 2} u_{n - 2} v_{n - 2}^{T}$

will be the nearest matrix of rank $n - 2$ , and so on. In particular, if A is a nonsingular $n \times n$ matrix, then $A^{'}$ is singular and $‖ A - A^{'} ‖_{F} = σ_{n}$ . Thus, $σ_{n}$ may be taken as a measure of how close a square matrix is to being singular.

The reader should be careful not to use the value of det $(A)$ as a measure of how close A is to being singular. If, for example, A is the $100 \times 100$ diagonal matrix whose diagonal entries are all $\frac{1}{2}$ , then $det (A) = 2^{- 100}$ ; however, $σ_{100} = \frac{1}{2}$ . By contrast, the matrix in the next example is very close to being singular even though its determinant is 1 and all its eigenvalues are equal to 1.

Example 3

Let A be an $n \times n$ upper triangular matrix whose diagonal elements are all 1 and whose entries above the main diagonal are all −1:

$A = [\begin{matrix} 1 & - 1 & - 1 & \dots & - 1 & - 1 \\ 0 & 1 & - 1 & \dots & - 1 & - 1 \\ 0 & 0 & 1 & \dots & - 1 & - 1 \\ ⋮ & \dots \\ 0 & 0 & 0 & \dots & 1 & - 1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{matrix}]$

Notice that det $det (A) = det (A^{- 1}) = 1$ and all the eigenvalues of A are 1. However, if n is large, then A is close to being singular. To see this, let

$B = [\begin{matrix} 1 & - 1 & - 1 & \dots & - 1 & - 1 \\ 0 & 1 & - 1 & \dots & - 1 & - 1 \\ 0 & 0 & 1 & \dots & - 1 & - 1 \\ ⋮ & \dots \\ 0 & 0 & 0 & \dots & 1 & - 1 \\ \frac{- 1}{2^{n - 2}} & 0 & 0 & \dots & 0 & 1 \end{matrix}]$

The matrix B must be singular, since the system $B x = 0$ has a nontrivial solution $x = (2^{n - 2}, 2^{n - 3}, \dots, 2^{0}, 1)^{T}$ . Since the matrices A and B differ only in the $(n, 1)$ position, we have

$‖ A - B ‖_{F} = \frac{1}{2^{n - 2}}$

It follows from Theorem 6.5.3 that

$σ_{n} = min_{X \sin gular} ‖ A - X ‖_{F} \leq ‖ A - B ‖_{F} = \frac{1}{2^{n - 2}}$

Thus, if $n = 100$ , then $σ_{n} \leq 1 / 2^{98}$ and, consequently, A is very close to singular.

∎

Application 1 Numerical Rank

In most practical applications, matrix computations are carried out by computers using finite-precision arithmetic. If the computations involve a nonsingular matrix that is very close to being singular, then the matrix will behave computationally exactly like a singular matrix. In this case, computed solutions of linear systems may have no digits of accuracy whatsoever. More generally, if an $m \times n$ matrix A is close enough to a matrix of rank r, where $r < min (m, n)$ , then A will behave like a rank r matrix in finite-precision arithmetic. The singular values provide a way of measuring how close a matrix is to matrices of lower rank; however, we must clarify what we mean by “very close.” We must decide how close is close enough. The answer depends on the machine precision of the computer that is being used.

Machine precision can be measured in terms of the unit roundoff error for the machine. Another name for unit roundoff is machine epsilon. To understand this concept, we need to know how computers represent numbers. If the computer uses the number base $β$ and keeps track of n digits, then it will represent a real number x by a floating-point number, denoted $f l (x)$ , of the form $\pm 0. d_{1} d_{2} \dots d_{n} \times β^{k}$ , where the digits $d_{i}$ are integers with $0 \leq d_{i} < β$ . For example, $- 0.54321469 \times 10^{25}$ is an 8-digit, base 10 floating-point number, and $0.110100111001 \times 2^{- 9}$ is a 12-digit, base 2 floating-point number. In Section 1 of Chapter 7, we will discuss floating-point numbers in more detail and give a precise definition of the machine epsilon. It turns out that the machine epsilon, ∊, is the smallest floating-point number that will serve as a bound for the relative error whenever we approximate a real number by a floating-point number; that is, for any real number x,

$| \frac{f l (x) - x}{x} | < \in$ (10)

For 8-digit, base 10 floating-point arithmetic, the machine epsilon is $5 \times 10^{- 8}$ . For 12-digit, base 2 floating-point arithmetic, the machine epsilon is $(\frac{1}{2})^{- 12}$ , and, in general, for n-digit base $β$ arithmetic, the machine epsilon is $\frac{1}{2} \times β^{- n + 1}$ .

In light of (10), the machine epsilon is the natural choice as a basic unit for measuring rounding errors. Suppose that A is a matrix of rank n, but k of its singular values are less than a “small” multiple of the machine epsilon. Then A is close enough to matrices of rank $n - k$ so that for floating point computations, it is impossible to tell the difference. In this case, we would say that A has numerical rank $n - k$ . The multiple of the machine epsilon that we use to determine numerical rank depends on the dimensions of the matrix and on its largest singular value. The definition of numerical rank that follows is one that is commonly used.

Often in the context of finite-precision computations, the term rank will be used with the understanding that it actually refers to the numerical rank. For example, the MATLAB command rank(A) will compute the numerical rank of A, rather than the exact rank.

Example 4

Suppose that A is a $5 \times 5$ matrix with singular values

$σ_{1} = 4, σ_{2} = 1, σ_{3} = 10^{- 12}, σ_{4} = 3.1 \times 10^{- 14}, σ_{5} = 2.6 \times 10^{- 15}$

and suppose that the machine epsilon is $5 \times 10^{- 15}$ . To determine the numerical rank, we compare the singular values to

$σ_{1} max (m, n) \in = 4.5.5 \times 10^{- 15} = 10^{- 13}$

Since three of the singular values are greater than $10^{- 13}$ , the matrix has numerical rank 3.

∎

Application 2 Digital Image Processing

A video image or photograph can be digitized by breaking it up into a rectangular array of cells (or pixels) and measuring the gray level of each cell. This information can be stored and transmitted as an $m \times n$ matrix A. The entries of A are nonnegative numbers corresponding to the measures of the gray levels. Because the gray levels of any one cell generally turn out to be close to the gray levels of its neighboring cells, it is possible to reduce the amount of storage necessary from mn to a relatively small multiple of $m + n + 1$ . Generally, the matrix A will have many small singular values. Consequently, A can be approximated by a matrix of much lower rank.

If A has singular value decomposition $U Σ V^{T}$ , then A can be represented by the outer product expansion

$A = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + \dots + σ_{n} u_{n} v_{n}^{T}$

The closest matrix of rank k is obtained by truncating this sum after the first k terms:

$A_{k} = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + \dots + σ_{k} u_{k} v_{k}^{T}$

The total storage for $A_{k}$ is $k (m + n + 1)$ . We can choose k to be considerably less than n and still have the digital image corresponding to $A_{k}$ very close to the original. For typical choices of k, the storage required for $A_{k}$ will be less than 20 percent of the amount of storage necessary for the entire matrix A.

Figure 6.5.3 shows an image corresponding to a $176 \times 260$ matrix A and three images corresponding to lower rank approximations of A. The gentlemen in the picture are (left to right) James H. Wilkinson, Wallace Givens, and George Forsythe (three pioneers in the field of numerical linear algebra).

Figure 6.5.3.

Courtesy Oakridge National Laboratory, U.S. Dept. of Energy

Figure 6.5.3. Full Alternative Text

Application 3 Information Retrieval-Latent Semantic Indexing

We return again to the information retrieval application discussed in Sections 1.3 and 5.1. In this application, a database of documents is represented by a database matrix Q. To search the database, we form a unit search vector x and set $y = Q^{T} x$ . The documents that best match the search criteria are those corresponding to the entries of y that are closest to 1.

Because of the problems of polysemy and synonymy, we can think of our database as an approximation. Some of the entries of the database matrix may contain extraneous components due to multiple meanings of words, and some may miss including components because of synonymy. Suppose that it were possible to correct for these problems and come up with a perfect database matrix P. If we set $E = Q - P$ , then, since $Q = P + E$ , we can think of E as a matrix representing the errors in our database matrix Q. Unfortunately, E is unknown, so we cannot determine P exactly. However, if we can find a simpler approximation $Q_{1}$ for Q, then $Q_{1}$ will also be an approximation for P. Thus, $Q_{1} = P + E_{1}$ for some error matrix $E_{1}$ . In the method of latent semantic indexing (LSI), the database matrix Q is approximated by a matrix $Q_{1}$ with lower rank. The idea behind the method is that the lower rank matrix may still provide a good approximation to P and, because of its simpler structure, may actually involve less error; that is, $∥ E_{1} ∥ < ∥ E ∥$ .

The lower rank approximation can be obtained by truncating the outer product expansion of the singular value decomposition of Q. This is equivalent to setting

$σ_{r + 1} = σ_{r + 2} = \cdot \cdot \cdot = σ_{n} = 0$

and then setting $Q_{1} = U_{1} Σ_{1} V_{1}^{T}$ , the compact form of the singular value decomposition of the rank r matrix. Furthermore, if $r < min (m, n) / 2$ , then this factorization is computationally more efficient to use and the searches will be speeded up. The speed of computation is proportional to the amount of arithmetic involved. The matrix vector multiplication $Q^{T} x$ requires a total of mn scalar multiplications (m multiplications for each of the n entries of the product). In contrast, $Q_{1}^{T} = V_{1} Σ_{1} U_{1}^{T}$ , and the multiplication $Q_{1}^{T} x = V_{1} (Σ_{1} (U_{1} x^{T}))$ requires a total of $r (m + n + 1)$ scalar multiplications. For example, if $m = n = 1000$ and $r = 200$ , then

$\begin{matrix} m n = 10^{6} & and & r (m + n + 1) = 200 \cdot 2001 = 400, 200 \end{matrix}$

The search with the lower rank matrix should be more than twice as fast.

Application 4 Psychology—Principal Component Analysis

In Section 5.1, we saw how psychologist Charles Spearman used a correlation matrix to compare scores on a series of aptitude tests. On the basis of the observed correlations, Spearman concluded that the test results provided evidence of common basic underlying functions. Further work by psychologists to identify the common factors that make up intelligence has led to development of an area of study known as factor analysis.

Predating Spearman’s work by a few years is a 1901 paper by Karl Pearson analyzing a correlation matrix derived from measuring seven physical variables for each of 3000 criminals. This study contains the roots of a method popularized by Harold Hotelling in a well-known paper published in 1933. The method is known as principal component analysis.

To see the basic idea of this method, assume that a series of n aptitude tests is administered to a group of m individuals and that the deviations from the mean for the tests form the columns of an $m \times n$ matrix X. Although, in practice, column vectors of X are positively correlated, the hypothetical factors that account for the scores should be uncorrelated. Thus, we wish to introduce mutually orthogonal vectors $y_{1}, y_{2}, . . ., y_{r}$ corresponding to the hypothetical factors. We require that the vectors span $R (X)$ , and hence the number of vectors, r, should be equal to the rank of X. Furthermore, we wish to number these vectors in decreasing order of variance.

The first principal component vector, $y_{1}$ , should account for the most variance. Since $y_{1}$ is in the column space of X, we can represent it as a product $X v_{1}$ for some $v_{1} \in ℝ^{n}$ . The covariance matrix is

$S = \frac{1}{n - 1} X^{T} X$

and the variance of $y_{1}$ is given by

$var (y_{1}) = \frac{{(X v_{1})}^{T} X v_{1}}{n - 1} = v_{1}^{T} S v_{1}$

The vector $v_{1}$ is chosen to maximize $v^{T} S v$ over all unit vectors v. This can be accomplished by choosing $v_{1}$ to be a unit eigenvector of $X^{T} X$ belonging to its maximum eigenvalue $λ_{1}$ . (See Exercise 28 of Section 6.4.) The eigenvectors of $X^{T} X$ are the right singular vectors of X. Thus, $v_{1}$ is the right singular vector of X corresponding to the largest singular value $σ_{1} = \sqrt{λ_{1}}$ . If $u_{1}$ is the corresponding left singular vector, then

$y_{1} = X v_{1} = σ_{1} u_{1}$

The second principal component vector must be of the form $y_{2} = X v_{2}$ . It can be shown that the vector which maximizes $v^{T} S v$ over all unit vectors that are orthogonal to $v_{1}$ is just the second right singular vector $v_{2}$ of X. If we choose $v_{2}$ in this way and $u_{2}$ is the corresponding left singular vector, then

$y_{2} = X v_{2} = σ_{2} u_{2}$

and since

$y_{1}^{T} y_{2} = σ_{1} σ_{2} u_{1}^{T} u_{2} = 0$

it follows that $y_{1}$ and $y_{2}$ are orthogonal. The remaining $y_{i}$ ’s are determined in a similar manner.

In general, the singular value decomposition solves the principal component problem. If X has rank r and singular value decomposition $X = U_{1} Σ_{1} V_{1}^{T}$ (in compact form), then the principal component vectors are given by

$y_{1} = σ_{1} u_{1}, y_{2} = σ_{2} u_{2}, . . ., y_{r} = σ_{r} u_{r}$

The left singular vectors $u_{1}, . . ., u_{n}$ are the normalized principal component vectors. If we set $W = Σ_{1} V_{1}^{T}$ , then

$X = U_{1} Σ_{1} V_{1}^{T} = U_{1} W$

The columns of the matrix $U_{1}$ correspond to the hypothetical intelligence factors. The entries in each column measure how well the individual students exhibit that particular intellectual ability. The matrix W measures to what extent each test depends on the hypothetical factors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6.5 The Singular Value Decomposition

Create new playlist

Sign In

Sign Up

Table of Contents for
6.5 The Singular Value Decomposition