Structural Properties Underlying Randomized Linear Algebra Algorithms 147
Construct an n × , with = O(k/), structured random projection matrix Ω, for
example, uniformly sample a few rows from a randomized Hadamard transform (see,
e.g., [31] for a precise definition of the randomized Hadamard transform).
Return B = AΩ.
This algorithm, which amounts to choosing uniformly at random a small number of
columns in a randomly rotated basis, was introduced in [63], where it is proven that
A P
B
k
A
F
(1 + ) A P
U
k
A
F
(9.6)
where P
B
k
A is the projection of A onto the best rank-k approximation of B,holdswith
high probability. This bound, which is the random projection analog of the relative-error
CUR matrix approximations of [30,50], provides a bound only on the reconstruction error
of the top part of the spectrum of the input matrix. Additionally, it necessitates sampling
a relatively large number of columns = O(k/).
In many practical applications, for example, when providing high-quality numerical
implementations, it is preferable to parameterize the problem in order to choose some
number = k + p columns, where p is a modest additive oversampling factor, for example,
p isequalto10or20ork. When attempting to be this aggressive at minimizing the size
of the sample, the choice of the oversampling factor p is quite sensitive to the input. That
is, whereas the bound of Equation 9.6 holds for any worst-case input, here the proper
choice for the oversampling factor p could depend on the matrix dimensions, the decay
properties of the spectrum, and the particular choice made for the random projection
matrix [43,44,47,52,61,69].
To deal with these issues, the best numerical implementations of RandNLA algorithms
for low-rank matrix approximation, and those that obtain the strongest results in terms
of minimizing p, take advantage of Lemma 9.2 in a somewhat different way than was
originally used in the analysis of the CSSP. For example, rather than choosing O(k log k)
dimensions and then filtering them through exactly k dimensions, one can choose some
number of dimensions and project onto a k
-dimensional subspace, where k<k
, while
exploiting Lemma 9.2 to bound the error, as appropriate for the computational environment
at hand [44].
Consider, for example, the following random projection algorithm. Given A R
m×n
,a
rank parameter k, and an oversampling factor p:
Set = k + p.
Construct an n × random projection matrix Ω, either with i.i.d. Gaussian entries or
in the form of a structured random projection such as uniformly sampling a few rows
from a randomized Hadamard transform.
Return B = AΩ.
Although this approach is quite similar to the algorithms of [58,63], algorithms parameter-
ized in this form were first introduced in [47,52,69], where a suite of bounds of the form
A Z
2
10
min{m, n}A A
k
2
were shown to hold with high probability. Here, Z is a rank-k-or-greater matrix, easily
constructed from B. Such results can be used to obtain the so-called interpolative
decomposition, a variant of the basic CSSP with explicit numerical conditioning properties,
and [47,52,69] also provided a posteriori error estimates that are useful in situations where
one wants to choose the rank parameter k to be the numerical rank, as opposed to apriori
148 Handbook of Big Data
specifying k as part of the input. Such apriorichoices were more common in TCS algorithms
for the same problem that predated the aforementioned approach.
Consider, in addition, how the following random projection algorithm addresses the
issue that the decay properties of the spectrum can be important when it is of interest to
aggressively minimize the oversampling parameter p. Given a matrix A R
m×n
,arank
parameter k, an oversampling factor p, and an iteration parameter q:
Set = k + p.
Construct an n × random projection matrix Ω, either with i.i.d. Gaussian entries or
in the form of a structured random projection such as uniformly sampling a few rows
from a randomized Hadamard transform.
Return B =(AA
T
)
q
AΩ.
This algorithm, as well as a numerically stable variant of it, was introduced in [61], where
it was shown that bounds of the form
A Z
2
10
min{m, n}
1/(4q+2)
A A
k
2
hold with high probability. Again, Z is a rank-k-or-greater matrix easily constructed from
B; this bound should be compared with the bound of the previous algorithm. Basically,
this random projection algorithm modifies the previous algorithm by coupling a form of the
power iteration method within the random projection step and, in many cases, it leads to
improved performance [44,61].
In their review, [44] used Lemma 9.2 to clarify and simplify these and other prior
random projection methods. (Subsequent work, e.g., that of [40] which develops RandNLA
algorithms within the subspace iteration framework, has continued to use Lemma 9.2 in
somewhat different ways.) Lemma 9.2 was explicitly reproven (with squares in the norms)
in [44], using a proof based on the perturbation theory of orthogonal projectors, thus
providing an elegant alternative to the original proof of the inequality. Our inequality in
Lemma 9.2 was an essential ingredient of their work, allowing the authors of [44] to bound
the performance of their algorithms based on the relationship between the singular vectors
corresponding to the large singular values of A and their counterparts corresponding to the
small singular values of A. As the authors of [44] observe, “when a substantial proportion
of the mass of A appears in the small singular values, the constructed basis may have low
accuracy. Conversely, when the large singular values dominate, it is much easier to identify
a good low-rank basis.” Our main inequality, originally developed within the context of the
CSSP, precisely quantifies this tradeo in a strong sense, and it serves as a starting point
and foundation for the RandNLA theory reviewed in [44].
9.4.4 Improved Results for Nystr¨om-Based Machine Learning
Symmetric positive semi-definite (SPSD) matrices are of interest in many applications, in
particular for the so-called kernel-based machine learning methods [64]. In many situations,
matrices of interest are moderately well approximated by low-rank matrices, and in
many of these cases, one is interested in the so-called Nystr¨om-based low-rank matrix
approximation [27,36,67]. These are low-rank matrix approximations that are expressed
in terms of actual columns and rows, that is, they are essentially CSSP methods for
SPSD matrices that preserve the SPSD property. A challenge here is that, while CSSP
methods provide high-quality bounds for general matrices, it is difficult to preserve the SPSD
property and thus extend these to provide high-quality SPSD low-rank approximation of
SPSD matrices. Indeed, early work on Nystr¨om methods was either heuristic [67] or provided
rigorous but weak worst-case theory [27].
Structural Properties Underlying Randomized Linear Algebra Algorithms 149
A qualitative improvement in this area occurred with Gittens and Mahoney [36], which
used a result from Gittens [35] to preserve the SPSD property, while working with leverage-
based column sampling and related random projection methods. A critical component of
the analysis of [36] involved providing structural decompositions, which are variants of
Lemma 9.2 for SPSD matrices for the spectral, Frobenius, and trace norms. Subsequent
to this, Anderson et al. [1] introduced the so-called spectral gap error bound method to
provide still finer results in a common case: when one performs a very modest amount of
oversampling for input kernel matrices that do not have a large spectral gap, but that do
have a spectrum that decays rapidly. The analysis of [1] used a result from Gu [40] that
extended Lemma 9.2 by providing an analogous structural statement when one is interested
in splitting the matrix into three parts: the top, middle, and bottom (rather than just top
and bottom) parts of the spectrum. In each of these cases, increasingly finer results are
derived for several related problems by exploiting structural properties having to do with
the interaction of sampling/projection operators in the RandNLA algorithms with various
parts of the vector space defined by the input matrix.
9.5 Conclusions
The interdisciplinary history of RandNLA has seen a gradual movement toward providing
increasingly finer bounds for a range of low-rank (and other) matrix problems. In this
chapter, we have highlighted, described, and extended a deterministic structural result
underlying many state-of-the-art RandNLA algorithms for low-rank matrix approximation
problems. A general theme in this development is that this is accomplished by using general
algorithmic and statistical tools and specializing them to account for the fine-scale structure
of the Euclidean vector space defined by the data matrix. For example, while a vanilla
application of the Johnson–Lindenstrauss lemma, which is applicable to vectors in general
metric spaces, leads to interesting results (e.g., additive-error bounds on the top part of
the spectrum of the matrix being approximated), much stronger results (e.g., relative-error
bounds, the CSSP results that first introduced the predecessor of Lemma 9.1, as well as the
other results we have reviewed here) can be obtained by exploiting the vector space structure
of the Euclidean spaces defined by the top and bottom parts of the spectrum of A.
A challenge in interdisciplinary research areas such as RandNLA is that algorithms
solving seemingly different problems use similar structural results in various ways. At the
same time, diverse research areas study those problems from many different perspectives.
As a result, highlighting structural commonalities is rare and such structural results usually
get buried deep inside the technical analysis of the proposed methods. Highlighting the
central role of such structural results is important, especially as RandNLA methods are
increasingly being applied to data analysis tasks in applications ranging from genetics [46,
59,60] to astronomy [73] and mass spectrometry imaging [72], and as RandNLA algorithms
are increasingly being implemented in large-scale parallel and distributed computational
environments [54,55,70,71].
Acknowledgments
MWM acknowledges the Army Research Office, the Defense Advanced Research Projects
Agency XDATA and GRAPHS programs, and the Department of Energy for providing
150 Handbook of Big Data
partial support for this work. PD acknowledges the National Science Foundation for
providing partial support for this work via IIS-1319280, CCF-1016501, and DMS-1008983.
References
1. D. G. Anderson, S. S. Du, M. W. Mahoney, C. Melgaard, K. Wu, and M. Gu. Spectral
gap error bounds for improving CUR matrix decomposition and the Nystr¨om method. In
Proceedings of the 18th International Workshop on Articial Intelligence and Statistics,
pp. 19–27, 2015.
2. H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: Supercharging LAPACK’s least-
squares solver. SIAM Journal on Scientific Computing, 32:1217–1236, 2010.
3. J. D. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsifiers. In
Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 255–262,
2009.
4. C. H. Bischof and G. Quintana-Ort´ı. Algorithm 782: Codes for rank-revealing QR fac-
torizations of dense matrices. ACM Transactions on Mathematical Software, 24(2):254–
257, 1998.
5. C. H. Bischof and G. Quintana-Ort´ı. Computing rank-revealing QR factorizations of
dense matrices. ACM Transactions on Mathematical Software, 24(2):226–253, 1998.
6. C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix
reconstruction. In Proceedings of the 52nd Annual IEEE Symposium on Foundations of
Computer Science, pp. 305–314, 2011.
7. C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix
reconstruction. SIAM Journal on Computing, 43(2):687–717, 2014.
8. C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm
for the column subset selection problem. Technical report. Preprint: arXiv:0812.4293,
2008.
9. C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for
principal components analysis. In Proceedings of the 14th Annual ACM SIGKDD
Conference, pp. 61–69, 2008.
10. C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm
for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 968–977, 2009.
11. C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for the
k-means clustering problem. In Annual Advances in Neural Information Processing
Systems 22: Proceedings of the 2009 Conference, 2009.
12. C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas. Randomized dimension-
ality reduction for k-means clustering. IEEE Transactions on Information Theory,
61(2):1045–1062, 2015.
13. T. F. Chan and P.C. Hansen. Low-rank revealing QR factorizations. Numerical Linear
Algebra with Applications, 1:33–44, 1994.
Structural Properties Underlying Randomized Linear Algebra Algorithms 151
14. S. Chatterjee and A. S. Hadi. Influential observations, high leverage points, and outliers
in linear regression. Statistical Science, 1(3):379–393, 1986.
15. S. Chatterjee and A. S. Hadi. Sensitivity Analysis in Linear Regression. John Wiley &
Sons, New York, 1988.
16. K. L. Clarkson, P. Drineas, M. Magdon-Ismail, M. W. Mahoney, X. Meng, and D. P.
Woodruff. The fast Cauchy transform and faster robust linear regression. In Proceedings
of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 466–477, 2013.
17. K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input
sparsity time. In Proceedings of the 45th Annual ACM Symposium on Theory of
Computing, pp. 81–90, 2013.
18. E. S. Coakley, V. Rokhlin, and M. Tygert. A fast randomized algorithm for orthogonal
projection. SIAM Journal on Scientific Computing, 33(2):849–868, 2011.
19. National Research Council. Frontiers in Massive Data Analysis. National Academies
Press, Washington, DC, 2013.
20. A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset
selection. In Proceedings of the 51st Annual IEEE Symposium on Foundations of
Computer Science, pp. 329–338, 2010.
21. A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and
projective clustering via volume sampling. In Proceedings of the 17th Annual ACM-
SIAM Symposium on Discrete Algorithms, pp. 1117–1126, 2006.
22. A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approx-
imation. In Proceedings of the 10th International Workshop on Randomization and
Computation, pp. 292–303, 2006.
23. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices I:
Approximating matrix multiplication. SIAM Journal on Computing, 36:132–157, 2006.
24. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices
II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing,
36:158–183, 2006.
25. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices
III: Computing a compressed approximate matrix decomposition. SIAM Journal on
Computing, 36:184–206, 2006.
26. P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approxima-
tion of matrix coherence and statistical leverage. Journal of Machine Learning Research,
13:3475–3506, 2012.
27. P. Drineas and M. W. Mahoney. On the Nystr¨om method for approximating a Gram
matrix for improved kernel-based learning. Journal of Machine Learning Research,
6:2153–2175, 2005.
28. P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Sampling algorithms for
2
regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium
on Discrete Algorithms, pp. 1127–1136, 2006.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.62