Example of spectral clustering with Scikit-Learn

In this example, we are going to use the sinusoidal dataset previously shown. The first step is creating it (with 1,000 samples):

import numpy as np

from sklearn.preprocessing import StandardScaler

nb_samples = 1000

X = np.zeros(shape=(nb_samples, 2))

for i in range(nb_samples):
    X[i, 0] = float(i)
    
    if i % 2 == 0:
        X[i, 1] = 1.0 + (np.random.uniform(0.65, 1.0) * np.sin(float(i) / 100.0))
    else:
        X[i, 1] = 0.1 + (np.random.uniform(0.5, 0.85) * np.sin(float(i) / 100.0))
        
ss = StandardScaler()
Xs = ss.fit_transform(X)

At this point, we can try to cluster it using K-means (with n_clusters=2):

from sklearn.cluster import KMeans

km = KMeans(n_clusters=2, random_state=1000)
Y_km = km.fit_predict(Xs)

The result is shown in the following graph:

K-means clustering result using the sinusoidal dataset

As expected, K-means isn't able to separate the two sinusoids. The reader is free to try with different parameters, but the result will always be unacceptable because K-means bidimensional clusters are circles and no valid configurations exist. We can now employ spectral clustering using an affinity matrix based on the KNN algorithm (in this case, Scikit-Learn can produce a warning because the graph is not fully connected, but this normally doesn't affect the results). Scikit-Learn implements the SpectralClustering class, whose most important parameters are n_clusters, the number of expected clusters; affinity, which can be either 'rbf' or 'nearest_neighbors'; gamma (only for RBF); and n_neighbors (only for KNN). For our test, we have chosen to have 20 neighbors:

from sklearn.cluster import SpectralClustering

sc = SpectralClustering(n_clusters=2, affinity='nearest_neighbors', n_neighbors=20, random_state=1000)
Y_sc = sc.fit_predict(Xs)

The result of the spectral clustering is shown in the following graph:

Spectral clustering result using the sinusoidal dataset

As expected, the algorithm was able to separate the two sinusoids perfectly. As an exercise, I invite the reader to apply this method to the MNIST dataset, using both an RBF (with different gamma values) and KNN (with different numbers of neighbors). I also suggest to replot the t-SNE diagram and compare all the assignment errors. As the clusters are strictly non-convex, we don't expect a high Silhouette score. Other useful exercises can be: drawing the Silhouette plot and checking the result, assigning ground truth labels, and measuring the homogeneity and the completeness.

Table of Contents for Example of spectral clustering with Scikit-Learn

Create new playlist

Sign In

Sign Up

Table of Contents for
Example of spectral clustering with Scikit-Learn