Singular value decomposition

In a real-world recommender system, the rating matrix will eventually become very large as more users are added to the system and the list of items being offered grows. As a result, we may want to apply a dimensionality reduction technique to this matrix. Ideally, we would like to retain as much information as possible from the original matrix while doing this. One such method that has applications across a wide range of disciplines uses singular value decomposition, or SVD as it is commonly abbreviated to.

SVD is a matrix factorization technique that has a number of useful applications, one of which is dimensionality reduction. It is related to the PCA method of dimensionality reduction that we saw in Chapter 1, Gearing Up for Predictive Modeling, and many people confuse the two. SVD actually describes just a mathematical method of factorizing matrices. In fact, some implementations of PCA use SVD to compute the principal components.

Let's begin by looking at how this process works. SVD is a matrix factorization process, so we start with an original matrix representing our data and express this as a product of matrices. In a dimensionality reduction scenario, our input data matrix would be the matrix where the rows are data points and the columns are the features; thus, in R, this would just be a data frame. In our recommender systems scenario, the matrix we use is our rating matrix. Suppose that we call our rating matrix D and we have m users (rows) rating n items (columns). The SVD factorization of this matrix is given by:

Singular value decomposition

In the previous equation, U and V are square matrices and the matrix Σ is a matrix with the same dimensionality as our input matrix, D. In addition, it is a diagonal matrix, meaning that all the elements of the matrix are zero except those on the leading diagonal. These elements are conventionally ordered from largest to smallest and are known as the singular values of the matrix D, giving rise to the name singular value decomposition.

Note

Readers familiar with linear algebra will know that the eigenvalues of a matrix are often also described as containing information about the important dimensions of that matrix. It turns out that the eigenvalues of a matrix are related to the singular values through the following relationship—the singular values of a matrix, D, are the same as the square roots of the eigenvalues of the matrix product D × DT.

We can easily perform SVD on a matrix in R via the svd() function, which is available with R's base package. Let's see this using our existing ratingMatrix:

> options(digits = 2)
> (rm_svd<- svd(ratingMatrix))
$d
[1] 35.6 10.6  7.5  5.7  4.7  1.3
$u
      [,1]  [,2]   [,3]   [,4]   [,5]   [,6]
[1,] -0.44  0.48 -0.043 -0.401  0.315  0.564
[2,] -0.41 -0.56  0.703 -0.061  0.114  0.099
[3,] -0.38  0.24  0.062  0.689 -0.494  0.273
[4,] -0.43 -0.40 -0.521 -0.387 -0.483 -0.033
[5,] -0.44  0.42  0.170 -0.108 -0.003 -0.764
[6,] -0.33 -0.26 -0.447  0.447  0.641 -0.114
$v
      [,1]   [,2]  [,3]    [,4]   [,5]   [,6]
[1,] -0.13 -0.255  0.30 -0.0790  0.013  0.301
[2,] -0.33 -0.591  0.16  0.3234  0.065 -0.486
[3,] -0.25 -0.382 -0.36 -0.0625 -0.017 -0.200
[4,] -0.27  0.199 -0.36  0.5796  0.578  0.284
[5,] -0.38  0.460 -0.30  0.1412 -0.556 -0.325
[6,] -0.39  0.401  0.68  0.0073  0.239 -0.226
[7,] -0.42  0.044 -0.26 -0.7270  0.369 -0.047
[8,] -0.52 -0.161  0.11  0.0279 -0.398  0.628

The singular values are returned as a vector d, from which we can easily construct the diagonal matrix using the diag() function. To verify that this factorization really is the one that we expected, we can reconstruct our original rating matrix by simply multiplying the matrix factors that we have obtained:

>reconstructed_rm<- rm_svd$u %*% diag(rm_svd$d) %*% t(rm_svd$v)
>reconstructed_rm
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    2    5    7    8    9    7
[2,]    5    9    4    1    1    7    5    9
[3,]    1    4    2    5    8    6    2    8
[4,]    2    6    7    2    6    1    8    9
[5,]    1    3    2    4    8    9    7    7
[6,]    1    6    5    7    3    2    5    5

One thing to note here is that, if we were to attempt a direct equality check with our original matrix, we would most likely fail. This is due to rounding errors that are introduced when we store the factorized matrices. We can check that our two matrices are nearly equal using the all.equal() function:

>all.equal(ratingMatrix, reconstructed_rm, tolerance = 0.000001,
check.attributes = F)
[1] TRUE

The reader is encouraged to decrease the size of the tolerance and note that, after several decimal points, the equality check fails. Though the two matrices are not exactly equal, the difference is so small that this will not impact us in any significant way. Now, once we have this factorization, let's investigate our singular values. The first singular value of 35.6 is much larger than the smallest singular value of 1.3.

We can perform dimensionality reduction by keeping the top singular values and throwing the rest away. To do this, we'd like to know how many singular values we should keep and how many we should discard. One approach to this problem is to compute the square of the singular values, which can be thought of as the vector of matrix energy, and then pick the top singular values that preserve at least 90 % of the overall energy of the original matrix. This is easy to do with R as we can use the cumsum() function for creating a cumulative sum and the singular values are already ordered from largest to smallest:

> energy <- rm_svd$d ^ 2
>cumsum(energy) / sum(energy)
[1] 0.85 0.92 0.96 0.98 1.00 1.00

Keeping the first two singular values will retain 92 % of the energy of our original matrix. Using just two values, we can reconstruct our rating matrix and observe the differences:

>d92<- c(rm_svd$d[1:2], rep(0, length(rm_svd$d) - 2))
>reconstructed92_rm<- rm_svd$u %*% diag(d92) %*% t(rm_svd$v)
>reconstructed92_rm
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.68  2.0  1.9  5.1  8.3  8.0  6.7  7.2
[2,] 3.37  8.3  5.9  2.7  2.9  3.3  5.9  8.6
[3,] 1.10  3.0  2.4  4.1  6.4  6.3  5.9  6.7
[4,] 3.02  7.5  5.4  3.2  3.9  4.2  6.2  8.6
[5,] 0.87  2.5  2.2  5.1  8.1  7.9  6.8  7.5
[6,] 2.20  5.5  4.0  2.6  3.3  3.5  4.9  6.6

As we can see, there are a few differences in the absolute values, but most of the patterns of different users have been retained to a large extent. Discarding singular values effectively introduces zeros in the leading diagonal of matrix D in the factorization, so that this matrix ends up with entire rows and columns that only contain zeros. Consequently, we can truncate not only this matrix, but rows from the matrices U and V.

Thus, we reduce the size of the data that we have to store.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.161.225