Matrix decomposition

Matrix decomposition is the matrix equivalent to algebraic factorization. In this section, we will discuss methods that break a single matrix into the product of two or more smaller matrices.

QR decomposition

QR decomposition is a decomposition that breaks a matrix M into two different matrices, Q and R, such that M is equal to QR. Q is an orthogonal matrix (a matrix in which the inverse of the matrix is equal to its transposition), and R is an upper triangular matrix. Here, we demonstrate QR factorization in R on R's internal trees dataset using the qr command:

> data(trees)
> head(trees)
  Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2
4  10.5     72   16.4
5  10.7     81   18.8
6  10.8     83   19.7
> trees.qr <- qr(trees[,c(2:3)])

We recover the Q and R matrices with the qr.Q and qr.R commands, respectively. If we multiply these together, we see that we get our starting dataset as follows:

> Q <- qr.Q(trees.qr)
> R <- qr.R(trees.qr)
> head(Q%*%R)
     Height Volume
[1,]     70   10.3
[2,]     65   10.3
[3,]     63   10.2
[4,]     72   16.4
[5,]     81   18.8
[6,]     83   19.7

The most common use of QR factorization is to find numerically stable solutions to linear least squares regression. Let's say that we have a matrix of linear predictors X, and a vector of predicted data Y, to which we wish to fit a linear regression with regression coefficients stored in a vector B.

In fact, R's lm function uses QR factorization, as we demonstrate in the following code, and stores the results in an object called qr from which we can retrieve an R and a Q matrix as mentioned previously:

> trees.lm <- lm(trees$Volume ~ trees$Girth + trees$Height)
> names(trees.lm)
 [1] "coefficients"  "residuals"     "effects"       "rank"          "fitted.values" "assign"        "qr"            "df.residual"
 [9] "xlevels"       "call"          "terms"         "model"        
> Q.2 <- qr.Q(trees.lm$qr)
> R.2 <- qr.R(trees.lm$qr)

Notably, this QR decomposition was only done on a matrix with values containing trees$Girth and trees$Height; whereas previously, our QR decomposition was done on the entire matrix. To get our regression coefficients, we can apply the formula:

B = R-1QTY

In R, using the previous formula, we can develop the following code:

> solve(R.2) %*% t(Q.2) %*% trees$Volume
                    [,1]
(Intercept)  -57.9876589
trees$Girth    4.7081605
trees$Height   0.3392512

Calling the regression coefficients directly can be achieved as follows:

> trees.lm
Call:
lm(formula = trees$Volume ~ trees$Girth + trees$Height)
Coefficients:
 (Intercept)   trees$Girth  trees$Height  
    -57.9877        4.7082        0.3393  

As we can see, they match. It might be interesting to look at the matrix that we get when multiplying our Q and R matrices, as follows:

> head(Q.2%*%R.2)
     (Intercept) trees$Girth trees$Height
[1,]           1         8.3           70
[2,]           1         8.6           65
[3,]           1         8.8           63
[4,]           1        10.5           72
[5,]           1        10.7           81
[6,]           1        10.8           83

We see that instead of having the expected two columns, it has three. This is because the first column filled with 1s was added by R prior to doing the QR decomposition. The coefficient for this column of 1s is the intercept term in the regression.

Eigenvalue decomposition

An eigenvalue decomposition can only be done on a square matrix. This decomposition yields a matrix in which each column is termed an eigenvector and each eigenvector has a corresponding eigenvalue.

The eigenvalue decomposition of a matrix decomposes a matrix into the product of three other matrices:

  • A matrix of so-called "eigenvectors", a matrix defined as V, in which each column is one eigenvector
  • A diagonal matrix of the eigenvalues corresponding to each eigenvector, a matrix denoted as L2
  • The transpose of the matrix of eigenvalues, denoted VT

Expressed mathematically for a given matrix A as follows:

A = VL2 VT

These eigenvectors have important properties that make them relevant:

  • These eigenvectors are orthogonal to one another (that is, at 90 degrees or uncorrelated).
  • The original matrix postmultiplied by one of its eigenvectors is equal to that eigenvector premultiplied by its corresponding eigenvalue. Stated in another way, postmultiplying a matrix by one of its eigenvectors rescales that eigenvector, changing its length but not its direction. The magnitude of this rescaling is the eigenvalue. Because of this directional invariance, eigenvectors are often referred to as "characteristic roots".

To find eigenvectors and eigenvalues in R, we can use the eigen() function. This function returns a list with a vector containing the eigenvalues ordered from the highest to lowest and a matrix containing the eigenvectors corresponding to each eigenvalue. For example, we can find the eigenvalues and eigenvectors of the correlation matrix for the physical functioning data:

eigen(cor.mat)$values
eigen(cor.mat)$vectors

We can demonstrate the property that a matrix postmultiplied by one of its eigenvalues yields a scalar transformation of that eigenvalue. Here, we use the largest eigenvalue and its corresponding eigenvector:

cor.mat %*% eigen(cor.mat)$vectors[,1]
eigen(cor.mat)$values[1] * eigen(cor.mat)$vectors[,1]

These two vectors should have the same values. Note that one of these commands uses element-wise rather than matrixwise multiplication.

Lower upper decomposition

The lower upper decomposition is also termed LU decomposition. The idea of LU decomposition is to factor a matrix A into two triangle matrices, a lower (L) and an upper (U) triangle matrix:

A = LU

By breaking a matrix into two triangular matrices, we can reduce the computational demands of many numerical problems. However, finding the LU decomposition itself comes at a computational cost. As such, this can be used in iterative or repeated computational procedures in which A does not change, such as finding the inverse of a matrix or solving a system of linear equations repeatedly, in which the coefficients of the equations do not change between repetitions In such cases, we incur the cost of the LU decomposition only once, but reap the benefits of the computationally easier operation on the decomposed products repeatedly.

To perform LU decomposition in R, we can use the lu() function in the Matrix package. This does not only perform a simple LU decomposition, but may also rearrange (termed "permuting") the entries of the matrix to avoid zero entries. Applying lu() to a matrix will return an object that is not very useable by itself but can be passed to the expand() function to provide both the upper and lower matrix.

An example with matrix C is given in the following code:

library(Matrix)
lu.mat <- expand(lu(C))
lower.mat <- lu.mat$L
upper.mat <- lu.mat$U
p.mat <- lu.mat$P

To get the original matrix C, we perform matrix multiplication on the P, L, and U matrices, where P is the permutation matrix:

p.mat %*% lower.mat %*% upper.mat

If we do not include the permutation matrix, then we would arrive at a matrix with the same elements as our original matrix but ordered differently.

Cholesky decomposition

Cholesky decomposition is another matrix factorization that can only be computed on square matrices. The basic idea is once again to decompose a matrix M into the product of two triangle matrices—one upper triangle matrix U and the transposition of this upper triangle matrix, UT. Similar to the LU decomposition, the use of this method is typically to reduce the computational burden of solving large systems of equations.

In R, this is done with the chol() function, which returns only the upper triangle matrix:

chol(cor.mat)

To reconstruct the original matrix, simply premultiply the result of the chol() function by its transpose:

t(chol(cor.mat)) %*% chol(cor.mat)

Singular value decomposition

Singular value decomposition can be performed on square or rectangular matrices. The idea is to decompose a matrix M, with n columns into n "blocks" of information. Each block is composed of a matrix U, the transpose of a matrix V, and one of the n singular values d:

Singular value decomposition

When the singular value decomposition is done in R, the result returned is the matrix U, the matrix V, the individual values of d, and the singular values.

The practical use in this is that we can look at the size of each singular value, and if a particular singular value is negligible, then data can be summarized leaving out the block containing that singular value. This can be used as a data compression method or as a noise reduction method. In Chapter 6, Principal Component Analysis and the Common Factor Model, we will discuss singular value decomposition on covariance matrices as a data reduction tool.

Let's see how singular value decomposition can be used as a data compression method. Let's create a matrix that represents the picture of a cross composed of zeros and ones. A picture could be converted to a matrix as follows, but there is a file format known as "portable bitmap format" that does use text file representation as given in the following code:

cross.mat <- matrix(
  c(
    0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0
  ), 
byrow = TRUE, nrow = 10)

We can then apply the singular value decomposition, which will give us eight "blocks" represented as eight column vectors in the U and V matrices and eight values in the d vector:

cross.svd <- svd(cross.mat)

To inspect the size of each singular value, we look at d:

cross.svd$d

As we can see, the last six singular values are negligible, so we will recreate the image from the first two singular values and first two column vectors of U and V:

cross.svd$u[,c(1,2)] %*% diag(cross.svd$d[c(1,2)]) %*% t(cross.svd$v[, c(1,2)])

This gives us the recreated cross from compressed data. The previous code gives us something that looks like a mess because we estimate the original matrix rather than reproducing exactly. However, if we round up to an integer value, then we reproduce the original matrix:

round(cross.svd$u[,c(1,2)] %*% diag(cross.svd$d[c(1,2)]) %*% t(cross.svd$v[, c(1,2)]), 0)

The original cross.mat had 80 (8*10) data points. Our compressed data based on singular value decomposition contained 20 values from matrix u, 20 values from matrix v, and 2 values from vector d or a total of 42 values ((2*10)+(2*10)+2). Thus, we were able to compress the data to something nearly half as large.

Tip

If we are working with an image with only two colors, we can recreate the original image exactly by rounding to an integer value. However, if our matrix represented a gray scale image where the numeric value represented the color intensity of the gray, then we would get a close but not completely faithful recreation of the original image.

What if we wanted to clean up the image? Let's create a new cross with a speck of dust on the picture as follows:

cross.mat <- matrix(
  c(
    0, 0, 0, 1, 1, 0, 0, 0,
    1, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 1, 0, 0, 0
  )
,byrow = TRUE, nrow = 10)

Notice the value of 1 at position (2, 1) in the previous code, which is our speck of dust. Next, we recreate our cross based on the same idea:

cross.svd <- svd(cross.mat)
round(cross.svd.2$u[,c(1,2)] %*% diag(cross.svd.2$d[c(1,2)]) %*% t(cross.svd.2$v[, c(1,2)]), 0)

Voila! The speck of dust is gone.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.17.18