8
IRLBA: Fast Partial Singular Value
Decomposition Method
James Baglama
CONTENTS
8.1 Introduction .................................................................... 125
8.2 GKLB Procedure ............................................................... 126
8.3 Augmented Lanczos Bidiagonalization Method ............................... 128
8.4 Numerical Performance ........................................................ 132
8.5 Conclusion ...................................................................... 132
Appendix .............................................................................. 133
References ............................................................................. 134
8.1 Introduction
Approximation to some of the largest singular values and associated vectors (largest singular
triplets) of very large rectangular matrices is essential in many areas of data analysis, for
example, dimension reduction, data mining, data visualization, and detection of patterns.
Furthermore, the statistical procedure principal component analysis has a direct relationship
to the singular value decomposition (SVD) [16,26,31]. The importance of computing the
singular triplets of very large matrices has spurred numerous computational methods in the
literature; see, for example, [1,3,4,6,9,12–15,18,20,22,24,27,29,30] and references therein.
In 2005, Baglama and Reichel [1] developed a method, implicitly restarted Lanczos
bidiagonalization algorithm (IRLBA), that is well suited for computing some of the largest
(and smallest) singular triplets of extremely large rectangular matrices. The development
of this routine is based on a long history of methods in eigenvalue and singular value
computation. The foundation is based on the commonly used Golub–Kahan–Lanczos
bidiagonalization (GKLB) procedure [9], which is described in detail in Section 8.2. However,
the pivotal structure of the IRLBA comes from exploiting the mathematical equivalence
of two eigenvalue routines and extending them to a restarted GKLB procedure. In 1992,
Sorensen [28] developed a powerful eigenvalue computation method for very large eigenvalue
problems, which is often referred to as the implicitly restarted Arnoldi method (IRAM).
The IRAM was later implemented in the popular eigenvalue software package ARPACK
[22]. Morgan [25] showed that the IRAM can be implemented by augmenting the Krylov
subspace basis vectors by certain vectors. Such an implementation can be less sensitive to
propagated round-off errors than the IRAM. Wu and Simon [33] described an eigenvalue
method called the thick-restarted Lanczos tridiagonalization, which is simple to implement
125
126 Handb ook of Big Data
and mathematically equivalent to the symmetric IRAM. Harnessing these connections led
to the development of the IRLBA, a simple, powerful, and computationally fast partial
SVD method.
Shortly after developing the IRLBA, Baglama and Reichel [3] extended the method to
work with the block form of the GKLB procedure [10] and published MATLAB
computer
codes irlba and irlbablk [2]. Block methods are advantageous when computing multiple
or clustered singular values and can utilize Level 3 BLAS matrix–matrix products. Also,
block methods are favorable when matrices are so large that they have to be retrieved from
disk when matrix–vector products with them are to be evaluated. However, there are added
complexities to a block method, for example, linear dependence of blocks and block size.
We will not discuss block methods in this chapter.
Recently, additional software packages for the IRLBA were developed in the program-
ming languages R and Python. Lewis [23] developed The irlba Package in the open
source statistical programming language R and Kane and Lewis [17] created irlbpy,
an implementation of the IRLBA in Python for Numpy. See Section 8.4 for examples
demonstrating the performance of the software implementations.
Baglama and Reichel [1] developed two strategies: augmentation with Ritz vectors
irlba(R) for finding the largest (and smallest) singular triplets and augmentation with
harmonic Ritz vectors irlba(H) for finding the smallest singular triplets. When seeking
to compute the smallest singular triplets of a matrix, augmentation by harmonic Ritz
vectors, irlba(H), often yielded faster convergence. This method will not be described
in this chapter since our focus is in the context of large data and on computing the largest
singular triplets; therefore, we refer the reader to [1] for details on irlba(H). In this chapter,
we will refer to irlba(R) as IRLBA.
8.2 GKLB Procedure
Let A R
×n
be a large sparse matrix. We may assume that n, because otherwise
we replace the matrix by its transpose. Let u
i
R
and v
i
R
n
denote the left and right
singular vectors of A associated with the singular value σ
i
. Define U
n
=[u
1
,u
2
,...,u
n
]
R
×n
and V
n
=[v
1
,v
2
,...,v
n
] R
n×n
with orthonormal columns, as well as Σ
n
=
diag[σ
1
, σ
2
,...,σ
n
] R
n×n
.Then
AV
n
= U
n
Σ
n
and A
T
U
n
= V
n
Σ
n
(8.1)
are the SVD of A and A
T
, respectively. We assume the singular values are ordered,
σ
1
σ
2
... σ
n
0, (8.2)
and refer to {σ
i
,u
i
,v
i
} as a singular triplet of A.
We are interested in approximating the k largest singular triplets {σ
i
,u
i
,v
i
}
k
i=1
.Letthe
matrices U
k
R
×k
and V
k
R
n×k
consist of the first k columns of the matrices U
n
and
V
n
in the SVD (Equation 8.1) of A, and introduce Σ
k
= diag[σ
1
,...,σ
k
] R
k×k
. Then,
analogously to Equation 8.1, we have the partial SVD
AV
k
= U
k
Σ
k
and A
T
U
k
= V
k
Σ
k
. (8.3)
The approximations to Equation 8.3 can be obtained from projections onto Krylov subspaces
K
m
(A
T
A, p
1
)=span{p
1
,A
T
Ap
1
, (A
T
A)
2
p
1
,...,(A
T
A)
m1
p
1
},
K
m
(AA
T
,q
1
)=span{q
1
,AA
T
q
1
, (AA
T
)
2
q
1
,...,(AA
T
)
m1
q
1
},
(8.4)
with initial vectors p
1
and q
1
= Ap
1
/Ap
1
, respectively. Throughout this discussion, ·
denotes the Euclidean vector norm as well as the associated induced matrix norm.
IRLBA 127
Generically, the GKLB procedure for m min{, n} determines orthonormal bases
{p
1
,p
2
,...,p
m
} and {q
1
,q
2
,...,q
m
} for the Krylov subspaces (Equation 8.4). The GKLB
procedure is well-suited for large matrices since the routine only requires the evaluation of
matrix–vector products with A and A
T
. The steps of the GKLB procedure is presented in
the GKLB algorithm. A matrix interpretation at step m of the computations of the GKLB
algorithm yields the partial GKLB decomposition,
AP
m
= Q
m
B
m
A
T
Q
m
= P
m
B
T
m
+ r
m
e
T
m
(8.5)
where the matrices P
m
=[p
1
,...,p
m
] R
n×m
and Q
m
=[q
1
,...,q
m
] R
×m
have
orthonormal columns, the residual vector r
m
R
n
satisfies P
T
m
r
m
=0,ande
m
is the
m
th
axis vector of appropriate dimension. Further,
B
m
=
α
1
β
1
0
α
2
β
2
α
3
β
3
.
.
.
.
.
.
β
m1
0 α
m
R
m×m
(8.6)
is an upper bidiagonal matrix.
Algorithm 8.1 GKLB Procedure
Input: A R
×n
: large rectangular matrix,
p
1
R
n
: initial vector of unit length,
m : number of bidiagonalization steps.
Output: P
m
:= [p
1
,p
2
,...,p
m
] R
n×m
: matrix with orthonormal columns,
Q
m
:= [q
1
,q
2
,...,q
m
] R
×m
: matrix with orthonormal columns,
B
m
R
m×m
: upper bidiagonal matrix (8.6) with entries α
j
and β
j
,
r
m
R
n
: r esidual vector.
1. P
1
:= p
1
; q
1
:= Ap
1
;
2. α
1
:= q
1
; q
1
:= q
1
1
; Q
1
:= q
1
;
3. for j =1:m
4. r
j
:= A
T
q
j
α
j
p
j
;
5. Reortho gonalization: r
j
:= r
j
P
j
(P
T
j
r
j
);
6. if j<mthen
7. β
j
:= r
j
; p
j+1
:= r
j
j
; P
j+1
:= [P
j
,p
j+1
];
8. q
j+1
:= Ap
j+1
β
j
q
j
;
9. Reortho gonalization: q
j+1
:= q
j+1
Q
j
(Q
T
j
q
j+1
);
10. α
j+1
:= q
j+1
; q
j+1
:= q
j+1
j+1
; Q
j+1
:= [Q
j
,q
j+1
];
11. endif
12. endfor
The GKLB algorithm only requires matrix vector products with the matrices A and A
T
,and
the input matrix A can be replaced with functions that evaluate these products. To avoid
loss of orthogonality due to finite precision arithmetic, reorthogonalization is performed in
lines 5 and 9 of the GKLB algorithm, and the reader is referred to the reorthogonalization
strategies discussed in the literature, for example, [19,27,33]. We do remark that Simon and
Zha [27] observed that when matrix A is not very ill-conditioned, only the columns of one
128 Handb ook of Big Data
of the matrices P
m
or Q
m
need to be reorthogonalized, which reduces the computational
cost considerably when n.
The number of bidiagonalization steps m min{, n} is assumed to be small enough,
so that the partial GKLB decomposition (Equation 8.5) with the stated properties exists
and m>kso that an approximation to Equation 8.3 can be determined. We assume that
GKLB algorithm does not terminate early, that is, all α
j
> 0andβ
j
> 0for1 j m;
see [1] for a detailed discussion on early termination.
Analogously to Equation 8.1, let u
(B)
i
R
m
and v
(B)
i
R
m
denote the left and right
singular vectors of B
m
associated with the singular value σ
(B)
i
. Define U
(B)
m
=[u
(B)
1
,u
(B)
2
,...,
u
(B)
m
] R
m×m
and V
(B)
m
=[v
(B)
1
,v
(B)
2
,...,v
(B)
m
] R
m×m
with orthonormal columns, as well
as Σ
(B)
m
= diag[σ
(B)
1
, σ
(B)
2
,...,σ
(B)
m
] R
m×m
.Then
B
m
V
(B)
m
= U
(B)
m
Σ
(B)
m
and B
T
m
U
(B)
m
= V
(B)
m
Σ
(B)
m
(8.7)
are the SVD of B
m
and B
T
m
, respectively. We assume the singular values of B
m
are ordered
in the same manner as Equation 8.2.
Notice from Equations 8.5 and 8.7 that
AP
m
v
(B)
i
σ
(B)
i
Q
m
u
(B)
i
=0
A
T
Q
m
u
(B)
i
σ
(B)
i
P
m
v
(B)
i
=
e
T
m
Q
m
u
(B)
i
r
m
.
(8.8)
Using Equation 8.8, we see that approximations to the singular triplets {σ
i
,u
i
,v
i
} of A
can be obtained from the triplets {σ
(B)
i
,Q
m
u
(B)
i
,P
m
v
(B)
i
}. The criteria for accepting the
triplet {σ
(B)
i
,Q
m
u
(B)
i
,P
m
v
(B)
i
} as an approximation are obtained from the second equation
of Equation 8.8:
A
T
Q
m
u
(B)
i
σ
(B)
i
P
m
v
(B)
i
= |e
T
m
Q
m
u
(B)
i
|·r
m
. (8.9)
The singular triplet is accepted as an approximate singular triplet of A when the residual
error (Equation 8.9) is smaller than a prescribed tolerance, that is, when
|e
T
m
Q
m
u
(B)
i
|·r
m
≤δA, (8.10)
for a user-specified value of δ. The value of A in the bound (Equation 8.10) is easily
approximated by the singular value of largest magnitude of the bidiagonal matrix B
m
.This
computation is of low cost, using successive approximations during iterations leading to a
fairly good estimate of A.
Typically, |e
T
m
Q
m
u
(B)
i
|·r
m
does not become smaller than a prescribed tolerance for
modest values of m. Prohibitive storage requirements of the basis vectors along with the
computational cost prevents increasing m to get better approximations. Therefore, m>k
is kept of modest size and a restarted algorithm is used.
8.3 Augmented Lanczos Bidiagonalization Method
An efficient method for restarting in the context of eigenvalue problems was proposed by
Sorensen [28]. This method can be thought of as a curtailed QR algorithm and requires the
user to select a strategy for choosing a sequence of Krylov subspaces via a shift selection
used to determine the invariant subspace; see [28] for details. The recursion formulas in [28]
have been adapted to the GKLB method first by Bj¨orck et al. [7] in 1994 to solve ill-posed
problems and then by Kokiopoulou et al. [18]. The shifts are taken from the spectrum of
IRLBA 129
A
T
A and are applied through a bugle and chase algorithm. However, this implementation is
prone to round-off errors (see [5,32]); therefore, a mathematically equivalent implementation
less prone to round-off errors was developed.
The relationship to the symmetric eigenvalue problems is obtained from the connection
to the Lanczos tridiagonal decomposition of the symmetric matrix A
T
A by multiplying the
first equation in Equation 8.5 from the left side by A
T
:
A
T
AP
m
= P
m
B
T
m
B
m
+ α
m
r
m
e
T
m
.
(8.11)
Since B
m
is upper triangular, this yields a valid Lanczos tridiagonal decomposition, where
B
T
m
B
m
is an m × m symmetric tridiagonal matrix. The eigenvalues and eigenvectors of
B
T
m
B
m
, referred to as Ritz values and vectors, can be used to approximate the eigenvalues
and eigenvectors of A
T
A; see for example [11, Section 9.1.2] for a discussion on Lanczos
tridiagonalization. Equation 8.11 provides a matrix representation of the Lanczos tridiagonal
decomposition method for creating an orthogonal basis for K
m
(A
T
A, p
1
) in Equation 8.4. Wu
and Simon’s [33] mathematically equivalent approach to the symmetric IRAM of Sorensen
[28] is to augment K
mk
(A
T
A, p
k+1
) with Ritz vectors of B
T
m
B
m
:
˜
P
k
:=
P
m
v
(B)
1
,...,P
m
v
(B)
k
R
n×k
(8.12)
where p
k+1
= r
m
/r
m
and start the Lanczos tridiagonal method with the orthogonal
matrix
˜
P
k+1
:=
P
m
v
(B)
1
,...,P
m
v
(B)
k
,p
R
n×(k+1)
. (8.13)
Because of the relationship between the GKLB decomposition (Equation 8.5) and the
Lanczos decomposition (Equation 8.11), we can extend the idea of Wu and Simon [33]
to the GKLB method.
Multiplying the matrix (Equation 8.13) from the left by A and using Equation 8.8 yields
A
˜
P
k+1
=[
˜
Q
k
,Ap
k+1
]
σ
(B)
1
0
.
.
.
σ
(B)
k
01
, (8.14)
where
˜
Q
k
:=
Q
m
u
(B)
1
,...,Q
m
u
(B)
k
R
×k
. (8.15)
The matrix, [
˜
Q
k
,Ap
k+1
], in Equation 8.14 does not have orthogonal columns since
(Ap
k+1
)
T
Q
m
u
(B)
i
= ρ
i
(8.16)
where ρ
i
=(e
T
m
Q
m
u
(B)
i
) ·r
m
. Using Equation 8.16, an orthonormal vector q
k+1
can be
created
q
k+1
=
Ap
k+1
k
i=1
ρ
i
Q
m
u
(B)
i
α
k+1
(8.17)
with α
k+1
= q
k+1
, yielding the orthogonal matrix
˜
Q
k+1
:=
Q
m
u
(B)
1
,...,Q
m
u
(B)
k
,q
k+1
R
×(k+1)
. (8.18)
Equations 8.8, 8.13, 8.16, and 8.18 then yield
A
˜
P
k+1
=
˜
Q
k+1
˜
B
k+1
(8.19)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.36.71