Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Spectral clustering

An interesting application of eigenvectors is for clustering data. Using the eigenvectors of a matrix derived from a distance matrix, unlabelled data can be separated into groups. Spectral clustering methods get their name from the use of the spectrum of this matrix. A distance matrix for n elements (for example, the pairwise distance between data points) is an n × n symmetric matrix. Given such an n × n distance matrix M with distance values m_ij, we can create the Laplacian matrix of the data points as follows:

Here, I is the identity matrix and D is the diagonal matrix containing the row sums of M,

The data clusters are obtained from the eigenvectors of L. In the simplest case of data points with only two classes, the first eigenvector (that is, the one corresponding to the largest eigenvalue) is often enough to separate the data.

Here is an example for simple two-class clustering. The following code creates some 2D data points and clusters them based on the first eigenvector of the Laplacian matrix:

import scipy.linalg as sl

# create some data points
n = 100
x1 = 1.2 * random.randn(n, 2)
x2 = 0.8 * random.randn(n, 2) + tile([7, 0],(n, 1))
x = vstack((x1, x2))

# pairwise distance matrix
M = array([[ sqrt(sum((x[i] - x[j])**2)) 
                                  for i in range(2*n)]          
                                    for j in range(2 * n)])
 
# create the Laplacian matrix
D = diag(1 / sqrt( M.sum(axis = 0) ))
L = identity(2 * n) - dot(D, dot(M, D))

# compute eigenvectors of L
S, V = sl.eig(L)
# As L is symmetric the imaginary parts
# in the eigenvalues are only due to negligible numerical errors S=S.real
V=V.real

The eigenvector corresponding to the largest eigenvalue gives the grouping (for example, by thresholding at 0) and can be shown with:

largest=abs(S).argmax()
plot(V[:,largest])

The following figure (Figure 14.2) shows the result of spectral clustering of a simple two-class dataset:

Figure 14.2: shows result of simple two-class clustering

For more difficult datasets and more classes, one usually takes the k eigenvectors corresponding to the k largest eigenvalues and then clusters the data with some other method, but using the eigenvectors instead of the original data points. A common choice is the k-means clustering algorithm, which is the topic of the next example:

The eigenvectors are used as input to k-means clustering, as follows:

import scipy.linalg as sl
import scipy.cluster.vq as sc
# simple 4 class data
x = random.rand(1000,2)
ndx = ((x[:,0] < 0.4) | (x[:,0] > 0.6)) & 
                     ((x[:,1] < 0.4) | (x[:,1] > 0.6))
x = x[ndx]
n = x.shape[0]

# pairwise distance matrix
M = array([[ sqrt(sum((x[i]-x[j])**2)) for i in range(n) ]
                                       for j in range(n)])

# create the Laplacian matrix
D = diag(1 / sqrt( M.sum(axis=0) ))
L = identity(n) - dot(D, dot(M, D))

# compute eigenvectors of L
_,_,V = sl.svd(L)

k = 4
# take k first eigenvectors
eigv = V[:k,:].T

# k-means
centroids,dist = sc.kmeans(eigv,k)
clust_id = sc.vq(eigv,centroids)[0]

Note that we computed the eigenvectors here using the singular value decomposition, sl.svd. As L is symmetric, the result is the same as if we would have used sl.eig, but the eigenvectors come already ordered corresponding to the ordering of the eigenvalues. We also used throw-away variables. svd returns a list with three arrays, the left and right singular vectors U, V, and the singular values S, as follows:

U, S, V = sl.svd(L)

As we do not need U and S here, we can throw them away when unpacking the return value of svd:

_, _, V = sl.svd(L)

The result can be plotted using:

for i in range(k):
    ndx = where(clust_id == i)[0]
    plot(x[ndx, 0], x[ndx, 1],'o')
axis('equal')

The following figure shows the result of spectral clustering of a simple multiclass dataset:

Figure 14.3: An example of spectral clustering of a simple four class dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Spectral clustering

Create new playlist

Sign In

Sign Up

Spectral clustering

Table of Contents for
Spectral clustering