Isomap

Isomap is one of the simplest algorithms, and it's based on the idea of reducing the dimensionality while trying to preserve the geodesic distances measured on the original manifold where the input data lies. The algorithm works in three steps. The first operation is a k-nearest neighbors clustering and the construction of the following graph. The vertices will be the samples, while the edges represent the connections among nearest neighbors, and their weight is proportional to the distance to the corresponding neighbor.

The second step adopts the Dijkstra algorithm to compute the shortest pairwise distances on the graph of all couples of samples. In the following graph, there's a portion of a graph, where some shortest distances are marked:

Example of a graph with marked shortest distances

For example, as x₃ is a neighbor of x₅ and x₇, applying the Dijkstra algorithm, we could get the shortest paths d(x₃, x₅) = w₅₃ and d(x₃, x₇) = w₇₃. The computational complexity of this step is about O(n²log n + n²k), which is lower than O(n³) when k << n (a condition normally met); however, for large graphs (with n >> 1), this is often the most expensive part of the whole algorithm.

The third step is called metric multidimensional scaling, which is a technique for finding a low-dimensional representation while trying to preserve the inner product among samples. If we have a P-dimensional dataset X, the algorithm must find a Q-dimensional set Φ with Q < P minimizing the function:

As proven in Semi-Supervised Learning Chapelle O., Schölkopf B., Zien A., (edited by), The MIT Press, the optimization is achieved by taking the top Q eigenvectors of the Gram matrix G_ij = x_i · x_j(or in matrix form, G=XX^T if X ∈ ℜ^{n × M}); however, as the Isomap algorithm works with pairwise distances, we need to compute the matrix D of squared distances:

If the X dataset is zero-centered, it's possible to derive a simplified Gram matrix from D, as described by M. A. A. Cox and T. F. Cox:

Isomap computes the top Q eigenvalues λ₁, λ2, ..., λ_Q of G_Dand the corresponding eigenvectors ν₁, ν₂, ..., ν_Q and determines the Q-dimensional vectors as:

As we're going to discuss in Chapter 5, EM Algorithm and Applications (and also as pointed out by Saul, Weinberger, Sha, Ham, and Lee in Spectral Methods for Dimensionality Reduction, Saul L. K., Weinberger K. Q., Sha F., Ham J., and Lee D. D.), this kind of projection is also exploited by Principal Component Analysis (PCA), which finds out the direction with the highest variance, corresponding to the top k eigenvectors of the covariance matrix. In fact, when applying the SVD to the dataset X, we get:

The diagonal matrix Λ contains the eigenvalues of both XX^T and X^TX; therefore, the eigenvalues λ_Gi of G are equal to Mλ_^Σ_i where λ_^Σ_iare the eigenvalues of the covariance matrix Σ = M^-1X^TX. Hence, Isomap achieves the dimensionality reduction, trying to preserve the pairwise distances, while projecting the dataset in the subspace determined by a group of eigenvectors, where the maximum explained variance is achieved. In terms of information theory, this condition guarantees the minimum loss with an effective reduction of dimensionality.

Scikit-Learn also implements the Floyd-Warshall algorithm, which is slightly slower. For further information, please refer to Introduction to Algorithms, Cormen T. H., Leiserson C. E., Rivest R. L., The MIT Press.

Table of Contents for Isomap

Create new playlist

Sign In

Sign Up

Table of Contents for
Isomap