An example of FastICA with Scikit-Learn

Using the same dataset, we can now test the performance of the ICA. However, in this case, as explained, we need to zero-center and whiten the dataset, but fortunately these preprocessing steps are done by the Scikit-Learn implementation (if the parameter whiten=True is omitted).

To perform the ICA on the MNIST dataset, we're going to instantiate the FastICA class, passing the arguments n_components=64 and the maximum number of iterations max_iter=5000. It's also possible to specify which function will be used to approximate the negentropy; however, the default is log cosh(x), which is normally a good choice:

from sklearn.decomposition import FastICA

fastica = FastICA(n_components=64, max_iter=5000, random_state=1000)
fastica.fit(X)

At this point, we can visualize the components (which are always available through the components_ instance variance):

Independent components of the MNIST dataset extracted by the FastICA algorithm (64 components)

There are still some redundancies (the reader can try to increase the number of components) and background noise; however, it's now possible to distinguish some low-level features (such as oriented stripes) that are common to many digits. This representation isn't very sparse yet. In fact, we're always using 64 components (like for FA and PCA); therefore, the dictionary is under-complete (the input dimensionality is 28 × 28 = 784). To see the difference, we can repeat the experiment with a dictionary ten times larger, setting n_components=640:

fastica = FastICA(n_components=640, max_iter=5000, random_state=1000)
fastica.fit(Xs)

A subset of the new components (100) is shown in the following screenshot:

Independent components of the MNIST dataset extracted by the FastICA algorithm (640 components)

The structure of these components is almost elementary. They represent oriented stripes and positional dots. To check how an input is rebuilt, we can consider the mixing matrix A (which is available as the mixing_ instance variable). Considering the first input sample, we can check how many factors have a weight less than half of the average:

M = fastica.mixing_
M0 = M[0] / np.max(M[0])

print(len(M0[np.abs(M0) < (np.mean(np.abs(M0)) / 2.0)]))
233

The sample is rebuilt using approximately 410 components. The level of sparsity is higher, but considering the granularity of the factors, it's easy to understand that many of them are needed to rebuild even a single structure (like the image of a 1) where long lines are present. However, this is not a drawback because, as already mentioned, the main goal of the ICA is to extract independent components. Considering an analogy with the cocktail party example, we could deduce that each component represents a phoneme, not the complete sound of a word or a sentence.

The reader can test a different number of components and compare the results with the ones achieved by other sparse coding algorithms (such as Dictionary Learning or Sparse PCA).

Table of Contents for An example of FastICA with Scikit-Learn

Create new playlist

Sign In

Sign Up

Table of Contents for
An example of FastICA with Scikit-Learn