9. Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms (1/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Structural Properties Underlying High-Quality

Randomized Numerical Linear A lgebra Algorithms

Michael W. Mahoney and Petros Drineas

CONTENTS

9.1 Introduction .................................................................... 137

9.2 Overview ....................................................................... 138

9.3 Our Main Technical Result .................................................... 140

9.3.1 Statement of the Main Technical Result .............................. 140

9.3.2 Popular Special Case .................................................. 141

9.3.3 Proof of the Main Technical Result ................................... 142

9.4 Applications of Our Main Technical Result ................................... 143

9.4.1 CSSP: Theory ......................................................... 143

9.4.2 CSSP: Data Analysis and Machine Learning ......................... 145

9.4.3 Random Projections for Low-Rank Matrix Approximations .......... 146

9.4.4 Improved Results for Nystr¨om-Based Machine Learning ............. 148

9.5 Conclusions ..................................................................... 149

Acknowledgments ...................................................................... 149

References ............................................................................. 150

9.1 Introduction

In recent years, the amount of data that has been generated and recorded has grown

enormously, and data are now seen to be at the heart of modern economic activity,

innovation, and growth. See, for example, the report by the McKinsey Global Institute [51],

which identiﬁes ways in which Big Data have transformed the modern world, as well as

the report by the National Research Council [19], which discusses reasons for and technical

challenges in massive data analysis. In many cases, these so-called Big Data are modeled as

matrices, basically since an m×n matrix A provides a natural mathematical structure with

which to encode information about m objects, each of which is described by n features.

As a result, while linear algebra algorithms have been of interest for decades in areas

such as numerical linear algebra (NLA) and scientiﬁc computing, in recent years, there

has been renewed interest in developing matrix algorithms that are appropriate for the

analysis of large datasets that are represented in the form of matrices. For example, tools

such as the singular value decomposition (SVD) and the related principal components

analysis (PCA) [38] permit the low-rank approximation of a matrix, and they have have

had a profound impact in diverse areas of science and engineering. They have also been

studied extensively in large-scale machine learning and data analysis applications, in settings

137

138 Handbook of Big Data

ranging from web search engines and social network analysis to the analysis of astronomical

and biological data.

Importantly, the structural and noise properties of matrices that arise in machine

learning and data analysis applications are typically very diﬀerent than those of matrices

that arise in scientiﬁc computing and NLA. This has led researchers to revisit traditional

problems in light of new requirements and to consider novel algorithmic approaches to

many traditional matrix problems. One of the more remarkable trends in recent years is

a new paradigm that arose in theoretical computer science (TCS) and that involves the

use of randomization as a computational resource for the design and analysis of algorithms

for fundamental matrix problems. Randomized NLA (RandNLA) is the interdisciplinary

research area that exploits randomization as a computational resource to develop improved

algorithms for large-scale linear algebra problems, for example, matrix multiplication,

linear regression, and low-rank matrix approximation [49]. In this chapter, we will discuss

RandNLA, highlighting how many of the most interesting RandNLA developments for

problems related to improved low-rank matrix approximation boil down to exploiting a

particular structural property of Euclidean vector spaces. This structural property is of

interest in and of itself (for researchers interested in linear algebra per se ), but it is also of

interest more generally (for researchers interested in using linear algebra) since it highlights

strong connections between algorithms for many seemingly unrelated matrix problems.

9.2 Overview

As background, we note that early work in RandNLA focused on low-rank approximation

problems and led to results that were primarily of theoretical interest in idealized models of

data access [23–25,34,57,63]. An overview of RandNLA for readers not familiar with the area

has recently been provided [49]. Subsequent work on very overdetermined linear regression

problems, for example, least-squares regression problems with an input matrix A ∈ R

m×n

with m  n, led to several remarkable successes for RandNLA: theoretical results for worst-

case inputs for the running time in the RAM model that improve upon the 200-year old

Gaussian elimination [17,26,31,53,56]; high-quality implementations that are competitive

with or better than traditional deterministic implementations, for example, as provided by

LAPACK, on a single machine [2,18,62]; and high-quality implementations in parallel and

distributed environments on up to terabyte-sized input matrices [16,54,55,70,71].

As has been described in detail, for example, in [49], both the more theoretical and

the more applied successes of RandNLA for these very overdetermined linear regression

problems were achieved by using, implicitly or explicitly, the so-called statistical leverage

scores of the tall input matrix A.

∗

In some cases, the use of leverage scores was explicit, in

that one used exact or approximate leverage scores to construct a nonuniform importance

sampling probability distribution with respect to which to sample rows from the input

matrix, thereby constructing a data-aware subspace embedding [26,28]. In other cases, the

use of leverage scores was implicit, in that one performed a random projection, thereby

implementing a data-oblivious subspace embedding [31,68].

†

In both cases, the improved

∗

The statistical leverage scores of a tall matrix A ∈ R

m×n

with m  n are equal to the diagonal elements

of the projection matrix onto the column span of A [14,48,50]. Thus, they capture a subtle but important

structural property of the Euclidean vector space from which the data were drawn.

†

Random projections can be applied in more general metric spaces, but in a Euclidean vector space, a

random projection essentially amounts to rotating to a random basis, where the leverage scores are uniform

and thus where uniform sampling can be applied [49].

Structural Properties Underlying Randomized Linear Algebra Algorithms 139

theoretical and practical results for overdetermined linear regression problems were obtained

by coupling the original, rather theoretical, RandNLA ideas more closely with structural

properties of the input data [49].

In parallel with these successes on overdetermined regression problems, there have

also been several impressive successes on applying RandNLA methods to a wide range

of seemingly diﬀerent low-rank matrix approximation problems.

‡

For example, consider the

following problems (which are described in more detail in Section 9.4):

• The column subset selection problem (CSSP), in which one seeks to select the most

informative subset of exactly k columns from a matrix.

• The problem of using random projections to approximate low-rank matrix approx-

imations faster than traditional SVD-based or QR-based deterministic methods for

worst-case input matrices and/or for inputs that are typical in scientiﬁc computing

applications.

• The problem of developing improved Nystr¨om-based low-rank matrix approximations

of symmetric positive deﬁnite matrices.

• The problem of developing improved machine learning and data analysis methods to

identify interesting features in the data (feature selection).

These problems often arise in very diﬀerent research areas, and they are—at least super-

ﬁcially—quite diﬀerent. Relatedly, it can be diﬃcult to tell what—if anything—improved

algorithms for one problem mean for the possibility of improved algorithms for the others.

In this chapter, we highlight and discuss a particular deterministic structural property of

Euclidean vector spaces that underlies the recent improvements in RandNLA algorithms for

all of the above low-rank matrix approximation problems.

(See Lemma 9.1 for a statement

of this result.) This structural property characterizes the interaction between the singular

subspaces of the input matrix A and any (deterministic or randomized) sketching matrix.

In particular, this structural property is deterministic, in the sense that it is a statement

about the (ﬁxed) input data A and not the (randomized) algorithm. Moreover, it holds for

arbitrary matrices A, that is, matrices that have an arbitrarily large number of columns and

rows and not necessarily just tall-and-thin or short-and-fat matrices A, as was the case in

the over or under-determined least-squares problems. The structural property thus applies

most directly to problems where one is interested in low-rank approximation with respect

to a low-rank space of dimension k  min{m, n}.

In RandNLA, the sketching matrix is typically either a matrix representing the operation

of sampling columns or rows from the input matrix, or a matrix representing the random

projection operation. In that case, this structural property has an interpretation in terms

of how the sampling or projection operation interacts with the subspaces deﬁned by the top

and bottom part of the spectrum of A. We emphasize, however, that this structural property

holds more generally: in particular, it holds for any (deterministic or randomized) sketching

matrix and thus it is a property of independent interest. For example, while it is outside

the scope of this chapter to discuss in detail, one can easily imagine using this property to

‡

By low-rank matrix appr oximation problems, we informally mean problems where the input is a general

matrix A ∈ R

m×n

,wherebothm and n are large, and a rank parameter k  min{m, n}; the output is a

low-rank approximation to A, not necessarily the optimal one, that is computed via the SVD.

Although this structural property is central to all of the above problems, its role is typically obscured,

since it is often secondary to the main result of interest in a given paper, and thus it is hidden deep within

the analysis of each of the superﬁcially diﬀerent methods that use it. This property was ﬁrst introduced by

Boutsidis et al. [10] in the context of the CSSP, it was subsequently used by Halko et al. [44] to simplify

the description of several related random projection algorithms, and it was then used—typically knowingly,

but sometimes unknowingly—by many researchers working on these and other problems.

140 Handbook of Big Data

derandomize RandNLA algorithms or to develop other deterministic matrix algorithms for

these and related matrix problems or to develop improved heuristics in machine learning

and data analysis applications. In the remainder of this chapter, we highlight this structural

property, stating and presenting an analysis of a more general version of it than has been

previously available. We also describe how it is used in several of the recent improvements

to various RandNLA algorithms for low-rank matrix approximation problems.

9.3 Our Main Technical Result

In this section, we state and prove our main technical result. This technical result is a

structural condition that characterizes the interaction between the singular subspaces of

the input matrix A and any deterministic or randomized sketching matrix.

9.3.1 Statement of the Main Technical Result

Recall that, given a matrix A ∈ R

m×n

, many RandNLA algorithms seek to construct a

sketch of A by post-multiplying A by some sketching matrix Z ∈ R

n×r

,wherer is much

smaller than n. (For example, Z could represent the action of random sampling or random

projection.) Thus, the resulting matrix AZ ∈ R

m×r

is matrix that is much smaller than the

original matrix A, and the interesting question is what kind of approximation guarantees

does it oﬀer for A.

A common approach is to explore how well AZ spans the principal subspace of A,and

one metric of accuracy is the error matrix, A − P

A,whereP

A is the projection of A

onto the subspace spanned by the columns of AZ. Formally,

=(AZ)(AZ)

= U

Recall that X

∈ R

n×m

is the Moore–Penrose pseudoinverse of any matrix X ∈ R

m×n

and that it can be computed via the SVD of X; see [38] for details. Similarly, U

∈ R

m×ρ

is the matrix of the left singular vectors of AZ,whereρ is the rank of AZ. The following

structural result oﬀers a means to bound any unitarily invariant norm of the error matrix

A − P

Lemma 9.1 Given A ∈ R

m×n

,letY ∈ R

n×k

be any matrix such that Y

Y = I

.Let

Z ∈ R

n×r

(r ≥ k) be any matrix such that Y

Z and AY have full rank. Then, for any

unitarily invariant norm ξ, we have that

A − P

A

≤



A − AY Y





A − AY Y



Z(Y



. (9.1)

Three comments about this lemma, one regarding Z, one regarding Y , and one regarding

the interaction between Z and Y , are in order.

• Lemma 9.1 holds for any matrix Z, regardless of whether Z is constructed determin-

istically or randomly. In the context of RandNLA, typical constructions of Z would

represent a random sampling or random projection operation.

• The orthogonal matrix Y in the above lemma is also arbitrary. In the context of

RandNLA, one can think of Y either as Y = V

,whereV

∈ R

n×k

is the matrix of the

top k right singular vectors of A, or as some other orthogonal matrix that approximates

; however, Lemma 9.1 holds more generally.

Structural Properties Underlying Randomized Linear Algebra Algorithms 141

• As stated in Lemma 9.1, Y must satisfy two conditions: the matrix Y

Z must have full

rank, equal to k,sincer ≥ k,andthematrixAY must also have full rank, again equal

to k.IfY = V

, then the constraint that AY must have full rank is trivially satisﬁed,

assuming that A has rank at least k. Additionally, the sampling and random projection

approaches that are used in high-quality RandNLA algorithms with suﬃciently large

values of r guarantee that the rank condition on Y

Z is satisﬁed [26,49]. More generally,

though, one could perform an a posteriori check that these two conditions hold.

9.3.2 Popular Special Case

Before providing a proof of this structural result, we will now consider a popular special

case of Lemma 9.1. To do so, we will let Y = V

∈ R

n×k

, namely the orthogonal matrix of

the top k right singular vectors of A. (Actually, any orthogonal matrix spanning that same

subspace would do in this discussion.) For notational convenience, we will let V

k,⊥

n×(ρ−k)

(respectively, Σ

k,⊥

∈ R

(n−ρ)×(n−ρ)

) be the matrix of the bottom ρ−k right singular vectors

(respectively, singular values) of A.LetA

∈ R

m×n

be the best rank k approximation to A

as computed by the SVD. It is well known that

= AV

Assuming that V

Z has full rank, then Lemma 9.1 implies that

A − P

A

≤A −A





(A − A

) Z







Note here that

A − A

= U

k,⊥

and if we drop, using unitary invariance, the matrix U

k,⊥

from the second norm at the

right-hand side of the above inequality, then we get

A − P

A

≤A −A





k,⊥



k,⊥







For the special case of ξ ∈{2,F}, this is exactly the structural condition underlying the

randomized low-rank projection algorithms of [44] that was ﬁrst introduced in the context

of the CSSP [10]. We summarize the above discussion in the following lemma.

Lemma 9.2 Given A ∈ R

m×n

,letV

∈ R

n×k

be the matrix of the top k right singular

vectors of A.LetZ ∈ R

n×r

(r ≥ k) be any matrix such that Y

Z has full rank. Then, for

any unitarily invariant norm ξ,

A − P

A

≤A −A





k,⊥



k,⊥







(9.2)

Equation 9.2 immediately suggests a proof strategy for bounding the error RandNLA

algorithms for low-rank matrix approximation: identify a sketching matrix Z such that

Z has full rank; at the same time, bound the relevant norms of





and V

k,⊥

Lemma 9.1 generalizes the prior use of Lemma 9.2 in several important ways:

• First, it is not necessary to focus on random sampling matrices or random projection

matrices, but instead, we consider arbitrary sketching matrices Z. This was actually

implicit in the analysis of the original version of Lemma 9.2 [10], but it seems worth

making that explicit here. It does, however, require the extra condition that AZ also

has full rank.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms (1/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
9. Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms (1/4)