Let's first reorder the data points according to the handwritten numbers:
import numpy as np
X = np.vstack([digits.data[digits.target==i]for i in range(10)])
y = np.hstack([digits.target[digits.target==i] for i in range(10)])
y will become array([0, 0, 0, ..., 9, 9, 9]).
Note that the t-SNE transformation can take minutes to compute on a regular laptop, and the tSNE command can be simply run as follows. We will first try running t-SNE with 250 iterations:
#Here we run tSNE with 250 iterations and time it
%%timeit
tsne_iter_250 = TSNE(init='pca',method='exact',n_components=2,n_iter=250).fit_transform(X)
Let's draw a scatter plot to see how the data cluster:
#We import the pandas and matplotlib libraries
import pandas as pd
import matplotlib
matplotlib.style.use('seaborn')
#Here we plot the tSNE results in a reduced two-dimensional space
df = pd.DataFrame(tsne_iter_250)
plt.scatter(df[0],df[1],c=y,cmap=matplotlib.cm.get_cmap('tab10'))
plt.show()
We can see that the clusters are not well separated at 250 iterations:
Let's now try running with 2000 iterations:
#Here we run tSNE for 2000 iteractions
tsne_iter_2000 = TSNE(init='pca',method='exact',n_components=2,n_iter=2000).fit_transform(X)
#Here we plot the figure
df2 = pd.DataFrame(tsne_iter_2000)
plt.scatter(df2[0],df2[1],c=y,cmap=matplotlib.cm.get_cmap('tab10'))
plt.show()
As seen from the following screenshot, the samples appear as 10 distinct blots of clusters. By running 2000 iterations, we have obtained far more satisfying results: