Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter Three

The Second Eigenvalue of the Google Matrix

3.1 INTRODUCTION

Before attempting to accelerate the computation of PageRank, it is useful first to prove some results regarding the convergence rate of the standard PageRank algorithm.

In this chapter, we determine analytically the modulus of the second eigenvalue for the Web hyperlink matrix used by Google for computing PageRank.

This has implications for the convergence rate of the standard PageRank algorithm as the Web scales, for the stability of PageRank to perturbations to the link structure of the Web, for the detection of Google spammers, and for the design of algorithms to speed up PageRank.

For the purposes of accelerating PageRank, perhaps the most useful result that comes out of this chapter is that the convergence rate of the standard PageRank algorithm is already very fast, and will remain so as the Web scales. Since the Web matrix is highly sparse, and since we show here that the number of iterations kthat the Power Method takes to converge is much less than the dimensionality n of the Web matrix (k n), the Power Method has an effective computational complexity O(n). This is important to keep in mind as an upper bound in the design of acceleration algorithms for PageRank. The computational efficiency of the Power Method for this matrix, in contrast to the high cost of matrix inversion, suggests the Power Method as the appropriate foundation for the development of fast eigensolvers for PageRank. This key intuition is the basis for the algorithms in chapters 5–7.

3.2 THEOREMS

Theorem 1. Let P be an n × n row-stochastic matrix. Let c be a real number such that 0 ≤ c ≤ 1. Let E be the n × n rank-one row-stochastic matrix E = ^T, where is the n-vector whose elements are all e_i = 1, and is an n-vector that represents a probability distribution.¹ Define the matrix A = [cP + (1 - c)E]^T. Its second eigenvalue |λ₂| ≤ c.

Theorem 2. Further, if P has at least two irreducible closed subsets (which is the case for the Web hyperlink matrix), then the second eigenvalue of A is given by λ₂ = c.

3.3 PROOF OF THEOREM 1

We first show that Theorem 1 is true for c = 0 and c = 1.

CASE 1: c = 0

If c = 0, then, from equation 2.2, A = E^T. Since E is a rank-one matrix, λ₂ = 0. Thus, Theorem 1 is proved for c = 0.

CASE 2: c = 1

If c = 1, then, from (2.2), A = P^T. Since P^T. is a column-stochastic matrix, |λ₂| ≤ 1. Thus, Theorem 1 is proved for c = 1.

CASE 3: 0 < c < 1

We prove this case via a series of lemmas.

LEMMA 1. The second eigenvalue of A has modulus |λ₂| < 1.

Proof. Consider the Markov chain corresponding to A^T. If the Markov chain corresponding to A^T has only one irreducible closed subchain S, and if S is aperiodic, then the chain corresponding to A^T must have a unique eigenvector with eigenvalue 1, by the Ergodic Theorem [31]. So we simply must show that the Markov chain corresponding to A^T has a single irreducible closed subchain S, and that this subchain is aperiodic. Lemma 1.1 shows that A^T has a single irreducible closed subchain S, and Lemma 1.2 shows that this subchain is aperiodic.

LEMMA 1.1. There exists a unique irreducible closed subset S of the Markov chain corresponding to A^T.

Proof. We split this proof into a proof of existence and a proof of uniqueness.

Existence. Let the set U be the states with nonzero components in . Let S consist of the set of all states reachable from U along nonzero transitions in the chain. S trivially forms a closed subset. Further, since every state has a transition to U, no subset of S can be closed. Therefore, S forms an irreducible closed subset.

Uniqueness. Every closed subset must contain U, and every closed subset containing U must contain S. Therefore, S must be the unique irreducible closed subset of the chain.

LEMMA 1.2. The unique irreducible closed subset S is an aperiodic subchain.

Proof. From Theorem 5 in Section 3.6, all members in an irreducible closed subset have the same period. Therefore, if at least one state in S has a self-transition, then the subset S is aperiodic. Let u be any state in U. By construction, there exists a self-transition from u to itself. Therefore, S must be aperiodic.

From Lemmas 1.1 and 1.2 and the ergodic theorem, |λ₂| < 1 and Lemma 1 is proved.

LEMMA 2. The second eigenvector ₂ of A is orthogonal to e: ^T₂ = 0.

Proof. Since |λ₂| < |λ₁| (by Lemma 1), the second eigenvector ₂ of A is orthogonal to the first eigenvector of A^T by Theorem 3 in Section 3.6. From Section 2.2, the first eigenvector of A^T is . Therefore, x₂ is orthogonal to .

LEMMA 3. E^T₂ = 0.

Proof. By definition, E = ^T and E^T = ^T. Thus, E^T₂ = ^T₂. From Lemma 2, ^T₂ = 0. Therefore, E^T₂ = 0.

LEMMA 4. The second eigenvector ₂ of A must be an eigenvector _i of P^T, and the corresponding eigenvalue is γ_i = λ₂/c.

Proof. From (2.2) and (2.3)

From Lemma 3 and (3.1), we have

We can divide through by c to get

If we let _i = ₂ and γ_i = λ₂/c, we can rewrite (3.2) as

Therefore, ₂ is also an eigenvector of P^T, and the relationship between the eigenvalues of A and P^T that correspond to ₂ is given by

λ₂ = cγ_i.

(3.5)

LEMMA 5. |λ₂| ≤ c.

Proof. We know from Lemma 4 that λ₂ = cγ_i. Because P is stochastic, we have that |γ_i| ≤ 1. Therefore, |λ₂| ≤ c, and Theorem 1 is proved.

3.4 PROOF OF THEOREM 2

Recall that Theorem 2 states: if P has at least two irreducible closed subsets, λ₂ = c.

Proof.

CASE 1: c = 0.

This is proven in Case 1 of Section 3.3.

CASE 2: c = 1.

This is proven trivially from Theorem 3 in Section 3.6.

CASE 3: 0 < c < 1.

We prove this as follows. We assume P has at least two irreducible closed subsets. We then construct a vector x_i that is an eigenvector of A and whose corresponding eigenvalue is λ_i = c. Therefore, |λ₂| ≥ c, and there exists a λ_i = c. From Theorem 1, λ₂ ≤ c. Therefore, if P has at least two irreducible closed subsets, λ₂ = c.

LEMMA 1. Any eigenvector y_i of P^T that is orthogonal to e is an eigenvector x_i of A. The relationship between eigenvalues is λ_i = cγ_i.

Proof. It is given that ^T _i = 0. Therefore,

By definition,

Therefore, from (2.2), (3.6), and (3.7),

Therefore, A_i = cγ_i_i and Lemma 1 is proved.

LEMMA 2. There exists a λ_i = c.

Proof. We construct a vector _i that is an eigenvector of P and is orthogonal to . From Theorem 3 in Section 3.6, the multiplicity of the eigenvalue 1 for P is equal to the number of irreducible closed subsets of P. Thus we can find two linearly independent eigenvectors ₁ and ₂ of P^T corresponding to the dominant eigenvalue 1. Let

If k₁ = 0, let _i = ₁, else if k₂ = 0, let _i = ₂. If k₁, k₂ > 0, then let _i = ₁/k₁ — ₂/k₂. Note that _i is an eigenvector of P^T with eigenvalue exactly 1 and that x_i is orthogonal to e. From Lemma 1, ₂ is an eigenvector of A corresponding to eigenvalue c. Therefore, the eigenvalue λ_i of A corresponding to eigenvector _i is λ_i = c.

Therefore, |λ₂| ≥ c, and there exists a λ_i = c. However, from Theorem 1, λ₂ ≤ c. Therefore, λ₂ = c and Theorem 2 is proved.²

3.5 IMPLICATIONS

Convergence of PageRank. The PageRank algorithm uses the Power Method to compute the principal eigenvector of A. The rate of convergence of the Power Method is given by |λ₂/λ₁| [30, 72]. Since the Web graph has been empirically shown to contain many irreducible closed subsets [13], Theorem 2 holds for the matrix A used in PageRank. For PageRank, the typical value of c has been given as 0.85; for this value of c, Theorem 2 thus implies that the convergence rate of the Power Method |λ₂/λ₁| for any Web link matrix A is 0.85. Therefore, the convergence rate of PageRank will be reasonably fast, even as the Web scales.

Stability of PageRank to perturbations in the link-structure. The modulus of the nonprincipal eigenvalues also determines whether the corresponding Markov chain is well-conditioned. As shown by Meyer in [54], the greater the eigengap |λ₁|—|λ₂|, the more stable the stationary distribution is to perturbations in the Markov chain. Our analysis provides an alternate explanation for the stability of PageRank shown by Ng et al. [55].

Accelerating PageRank computations. Knowing the second eigenvalue of the Web matrix λ₂ = c allows for the design of improved algorithms to compute PageRank. This will be discussed further in Chapter 5.

Spam detection. The eigenvectors corresponding to the second eigenvalue λ₂= c are an artifact of certain structures in the Web graph. In particular, each pair of leaf nodes in the SCC graph for the chain P corresponds to an eigenvector of A with eigenvalue c. These leaf nodes in the SCC are those subgraphs in the Web link graph which may have incoming edges, but have no edges to other components. Link spammers often generate such structures in attempts to hoard rank. Analysis of the nonprincipal eigenvectors of A may lead to strategies for combating link spam.

Broader implications. This proof has implication for spectral methods beyond Web search. For example, in the field of peer-to-peer networks, the EigenTrust reputation algorithm discussed in Chapter 9 computes the principal eigenvector of a matrix of the form defined in equation 2.2. This result shows that EigenTrust will converge quickly, minimizing network overhead. In the field of image segmentation, Perona and Freeman [57] present an algorithm that segments an image by thresholding the first eigenvector of the affinity matrix of the image. One may normalize the affinity matrix to be stochastic as in [53] and introduce a regularization parameter as in [56] to define a matrix of the form given in (2.2). The benefit of this is that one can choose the regularization parameter c to be large enough so that the computation of the dominant eigenvector is very fast, allowing the Perona-Freeman algorithm to work for very large scale images.

3.6 THEOREMS USED

This section contains theorems that are proven elsewhere and are used in proving Theorems 1 and 2 of this book.

Theorem 3 (from page 126 of [38]). If P is the transition matrix for a finite Markov chain, then the multiplicity of the eigenvalue 1 is equal to the number of irreducible closed subsets of the chain.

Theorem 4 (from page 4 of [72]). If _i is an eigenvector of A corresponding to the eigenvalue λ_i, and _j is an eigenvector of A^T corresponding to λ_j, then ^T_i _j = 0 (if λ_i ≠ λ_j).

Theorem 5 (from page 82 of [37]). Two distinct states belonging to the same class (irreducible closed subset) have the same period. In other words, the property of having period d is a class property.

¹ i.e., a vector whose elements are nonnegative and whose L₁ norm is 1.

² Note that there may be additional eigenvalues with modulus c, such as—c.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter Three - The Second Eigenvalue of the Google Matrix

Create new playlist

Sign In

Sign Up