9. Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms (3/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Structural Properties Underlying Randomized Linear Algebra Algorithms 147

• Construct an n × , with  = O(k/), structured random projection matrix Ω, for

example, uniformly sample a few rows from a randomized Hadamard transform (see,

e.g., [31] for a precise deﬁnition of the randomized Hadamard transform).

• Return B = AΩ.

This algorithm, which amounts to choosing uniformly at random a small number  of

columns in a randomly rotated basis, was introduced in [63], where it is proven that

A − P

A

≤ (1 + ) A − P

A

(9.6)

where P

A is the projection of A onto the best rank-k approximation of B,holdswith

high probability. This bound, which is the random projection analog of the relative-error

CUR matrix approximations of [30,50], provides a bound only on the reconstruction error

of the top part of the spectrum of the input matrix. Additionally, it necessitates sampling

a relatively large number of columns  = O(k/).

In many practical applications, for example, when providing high-quality numerical

implementations, it is preferable to parameterize the problem in order to choose some

number  = k + p columns, where p is a modest additive oversampling factor, for example,

p isequalto10or20ork. When attempting to be this aggressive at minimizing the size

of the sample, the choice of the oversampling factor p is quite sensitive to the input. That

is, whereas the bound of Equation 9.6 holds for any worst-case input, here the proper

choice for the oversampling factor p could depend on the matrix dimensions, the decay

properties of the spectrum, and the particular choice made for the random projection

matrix [43,44,47,52,61,69].

To deal with these issues, the best numerical implementations of RandNLA algorithms

for low-rank matrix approximation, and those that obtain the strongest results in terms

of minimizing p, take advantage of Lemma 9.2 in a somewhat diﬀerent way than was

originally used in the analysis of the CSSP. For example, rather than choosing O(k log k)

dimensions and then ﬁltering them through exactly k dimensions, one can choose some

number  of dimensions and project onto a k



-dimensional subspace, where k<k



≤ , while

exploiting Lemma 9.2 to bound the error, as appropriate for the computational environment

at hand [44].

Consider, for example, the following random projection algorithm. Given A ∈ R

m×n

rank parameter k, and an oversampling factor p:

• Set  = k + p.

• Construct an n ×  random projection matrix Ω, either with i.i.d. Gaussian entries or

in the form of a structured random projection such as uniformly sampling a few rows

from a randomized Hadamard transform.

• Return B = AΩ.

Although this approach is quite similar to the algorithms of [58,63], algorithms parameter-

ized in this form were ﬁrst introduced in [47,52,69], where a suite of bounds of the form

A − Z

 10



 min{m, n}A − A



were shown to hold with high probability. Here, Z is a rank-k-or-greater matrix, easily

constructed from B. Such results can be used to obtain the so-called interpolative

decomposition, a variant of the basic CSSP with explicit numerical conditioning properties,

and [47,52,69] also provided a posteriori error estimates that are useful in situations where

one wants to choose the rank parameter k to be the numerical rank, as opposed to apriori

148 Handbook of Big Data

specifying k as part of the input. Such apriorichoices were more common in TCS algorithms

for the same problem that predated the aforementioned approach.

Consider, in addition, how the following random projection algorithm addresses the

issue that the decay properties of the spectrum can be important when it is of interest to

aggressively minimize the oversampling parameter p. Given a matrix A ∈ R

m×n

,arank

parameter k, an oversampling factor p, and an iteration parameter q:

• Set  = k + p.

• Construct an n ×  random projection matrix Ω, either with i.i.d. Gaussian entries or

in the form of a structured random projection such as uniformly sampling a few rows

from a randomized Hadamard transform.

• Return B =(AA

)

AΩ.

This algorithm, as well as a numerically stable variant of it, was introduced in [61], where

it was shown that bounds of the form

A − Z







 min{m, n}



1/(4q+2)

A − A



hold with high probability. Again, Z is a rank-k-or-greater matrix easily constructed from

B; this bound should be compared with the bound of the previous algorithm. Basically,

this random projection algorithm modiﬁes the previous algorithm by coupling a form of the

power iteration method within the random projection step and, in many cases, it leads to

improved performance [44,61].

In their review, [44] used Lemma 9.2 to clarify and simplify these and other prior

random projection methods. (Subsequent work, e.g., that of [40] which develops RandNLA

algorithms within the subspace iteration framework, has continued to use Lemma 9.2 in

somewhat diﬀerent ways.) Lemma 9.2 was explicitly reproven (with squares in the norms)

in [44], using a proof based on the perturbation theory of orthogonal projectors, thus

providing an elegant alternative to the original proof of the inequality. Our inequality in

Lemma 9.2 was an essential ingredient of their work, allowing the authors of [44] to bound

the performance of their algorithms based on the relationship between the singular vectors

corresponding to the large singular values of A and their counterparts corresponding to the

small singular values of A. As the authors of [44] observe, “when a substantial proportion

of the mass of A appears in the small singular values, the constructed basis may have low

accuracy. Conversely, when the large singular values dominate, it is much easier to identify

a good low-rank basis.” Our main inequality, originally developed within the context of the

CSSP, precisely quantiﬁes this tradeoﬀ in a strong sense, and it serves as a starting point

and foundation for the RandNLA theory reviewed in [44].

9.4.4 Improved Results for Nystr¨om-Based Machine Learning

Symmetric positive semi-deﬁnite (SPSD) matrices are of interest in many applications, in

particular for the so-called kernel-based machine learning methods [64]. In many situations,

matrices of interest are moderately well approximated by low-rank matrices, and in

many of these cases, one is interested in the so-called Nystr¨om-based low-rank matrix

approximation [27,36,67]. These are low-rank matrix approximations that are expressed

in terms of actual columns and rows, that is, they are essentially CSSP methods for

SPSD matrices that preserve the SPSD property. A challenge here is that, while CSSP

methods provide high-quality bounds for general matrices, it is diﬃcult to preserve the SPSD

property and thus extend these to provide high-quality SPSD low-rank approximation of

SPSD matrices. Indeed, early work on Nystr¨om methods was either heuristic [67] or provided

rigorous but weak worst-case theory [27].

Structural Properties Underlying Randomized Linear Algebra Algorithms 149

A qualitative improvement in this area occurred with Gittens and Mahoney [36], which

used a result from Gittens [35] to preserve the SPSD property, while working with leverage-

based column sampling and related random projection methods. A critical component of

the analysis of [36] involved providing structural decompositions, which are variants of

Lemma 9.2 for SPSD matrices for the spectral, Frobenius, and trace norms. Subsequent

to this, Anderson et al. [1] introduced the so-called spectral gap error bound method to

provide still ﬁner results in a common case: when one performs a very modest amount of

oversampling for input kernel matrices that do not have a large spectral gap, but that do

have a spectrum that decays rapidly. The analysis of [1] used a result from Gu [40] that

extended Lemma 9.2 by providing an analogous structural statement when one is interested

in splitting the matrix into three parts: the top, middle, and bottom (rather than just top

and bottom) parts of the spectrum. In each of these cases, increasingly ﬁner results are

derived for several related problems by exploiting structural properties having to do with

the interaction of sampling/projection operators in the RandNLA algorithms with various

parts of the vector space deﬁned by the input matrix.

9.5 Conclusions

The interdisciplinary history of RandNLA has seen a gradual movement toward providing

increasingly ﬁner bounds for a range of low-rank (and other) matrix problems. In this

chapter, we have highlighted, described, and extended a deterministic structural result

underlying many state-of-the-art RandNLA algorithms for low-rank matrix approximation

problems. A general theme in this development is that this is accomplished by using general

algorithmic and statistical tools and specializing them to account for the ﬁne-scale structure

of the Euclidean vector space deﬁned by the data matrix. For example, while a vanilla

application of the Johnson–Lindenstrauss lemma, which is applicable to vectors in general

metric spaces, leads to interesting results (e.g., additive-error bounds on the top part of

the spectrum of the matrix being approximated), much stronger results (e.g., relative-error

bounds, the CSSP results that ﬁrst introduced the predecessor of Lemma 9.1, as well as the

other results we have reviewed here) can be obtained by exploiting the vector space structure

of the Euclidean spaces deﬁned by the top and bottom parts of the spectrum of A.

A challenge in interdisciplinary research areas such as RandNLA is that algorithms

solving seemingly diﬀerent problems use similar structural results in various ways. At the

same time, diverse research areas study those problems from many diﬀerent perspectives.

As a result, highlighting structural commonalities is rare and such structural results usually

get buried deep inside the technical analysis of the proposed methods. Highlighting the

central role of such structural results is important, especially as RandNLA methods are

increasingly being applied to data analysis tasks in applications ranging from genetics [46,

59,60] to astronomy [73] and mass spectrometry imaging [72], and as RandNLA algorithms

are increasingly being implemented in large-scale parallel and distributed computational

environments [54,55,70,71].

Acknowledgments

MWM acknowledges the Army Research Oﬃce, the Defense Advanced Research Projects

Agency XDATA and GRAPHS programs, and the Department of Energy for providing

150 Handbook of Big Data

partial support for this work. PD acknowledges the National Science Foundation for

providing partial support for this work via IIS-1319280, CCF-1016501, and DMS-1008983.

References

1. D. G. Anderson, S. S. Du, M. W. Mahoney, C. Melgaard, K. Wu, and M. Gu. Spectral

gap error bounds for improving CUR matrix decomposition and the Nystr¨om method. In

Proceedings of the 18th International Workshop on Artiﬁcial Intelligence and Statistics,

pp. 19–27, 2015.

2. H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: Supercharging LAPACK’s least-

squares solver. SIAM Journal on Scientiﬁc Computing, 32:1217–1236, 2010.

3. J. D. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsiﬁers. In

Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 255–262,

2009.

4. C. H. Bischof and G. Quintana-Ort´ı. Algorithm 782: Codes for rank-revealing QR fac-

torizations of dense matrices. ACM Transactions on Mathematical Software, 24(2):254–

257, 1998.

5. C. H. Bischof and G. Quintana-Ort´ı. Computing rank-revealing QR factorizations of

dense matrices. ACM Transactions on Mathematical Software, 24(2):226–253, 1998.

6. C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix

reconstruction. In Proceedings of the 52nd Annual IEEE Symposium on Foundations of

Computer Science, pp. 305–314, 2011.

7. C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix

reconstruction. SIAM Journal on Computing, 43(2):687–717, 2014.

8. C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm

for the column subset selection problem. Technical report. Preprint: arXiv:0812.4293,

2008.

9. C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for

principal components analysis. In Proceedings of the 14th Annual ACM SIGKDD

Conference, pp. 61–69, 2008.

10. C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm

for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM

Symposium on Discrete Algorithms, pp. 968–977, 2009.

11. C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for the

k-means clustering problem. In Annual Advances in Neural Information Processing

Systems 22: Proceedings of the 2009 Conference, 2009.

12. C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas. Randomized dimension-

ality reduction for k-means clustering. IEEE Transactions on Information Theory,

61(2):1045–1062, 2015.

13. T. F. Chan and P.C. Hansen. Low-rank revealing QR factorizations. Numerical Linear

Algebra with Applications, 1:33–44, 1994.

Structural Properties Underlying Randomized Linear Algebra Algorithms 151

14. S. Chatterjee and A. S. Hadi. Inﬂuential observations, high leverage points, and outliers

in linear regression. Statistical Science, 1(3):379–393, 1986.

15. S. Chatterjee and A. S. Hadi. Sensitivity Analysis in Linear Regression. John Wiley &

Sons, New York, 1988.

16. K. L. Clarkson, P. Drineas, M. Magdon-Ismail, M. W. Mahoney, X. Meng, and D. P.

Woodruﬀ. The fast Cauchy transform and faster robust linear regression. In Proceedings

of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 466–477, 2013.

17. K. L. Clarkson and D. P. Woodruﬀ. Low rank approximation and regression in input

sparsity time. In Proceedings of the 45th Annual ACM Symposium on Theory of

Computing, pp. 81–90, 2013.

18. E. S. Coakley, V. Rokhlin, and M. Tygert. A fast randomized algorithm for orthogonal

projection. SIAM Journal on Scientiﬁc Computing, 33(2):849–868, 2011.

19. National Research Council. Frontiers in Massive Data Analysis. National Academies

Press, Washington, DC, 2013.

20. A. Deshpande and L. Rademacher. Eﬃcient volume sampling for row/column subset

selection. In Proceedings of the 51st Annual IEEE Symposium on Foundations of

Computer Science, pp. 329–338, 2010.

21. A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and

projective clustering via volume sampling. In Proceedings of the 17th Annual ACM-

SIAM Symposium on Discrete Algorithms, pp. 1117–1126, 2006.

22. A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approx-

imation. In Proceedings of the 10th International Workshop on Randomization and

Computation, pp. 292–303, 2006.

23. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices I:

Approximating matrix multiplication. SIAM Journal on Computing, 36:132–157, 2006.

24. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices

II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing,

36:158–183, 2006.

25. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices

III: Computing a compressed approximate matrix decomposition. SIAM Journal on

Computing, 36:184–206, 2006.

26. P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruﬀ. Fast approxima-

tion of matrix coherence and statistical leverage. Journal of Machine Learning Research,

13:3475–3506, 2012.

27. P. Drineas and M. W. Mahoney. On the Nystr¨om method for approximating a Gram

matrix for improved kernel-based learning. Journal of Machine Learning Research,

6:2153–2175, 2005.

28. P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Sampling algorithms for 

regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium

on Discrete Algorithms, pp. 1127–1136, 2006.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms (3/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
9. Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms (3/4)