We will implement the t-SNE method by loading the TSNE function from scikit-learn, as follows:
from sklearn.manifold import TSNE
There are a few hypervariables that the user has to set upon running t-SNE, which include:
- 'init' : Initialization of embedding
- 'method': barnes_hut or exact
- 'perplexity': Default 30
- 'n_iter': Default 1000
- 'n_components': Default 2
Going into the mathematical details of individual hypervariables would be a chapter on its own, but we do have suggestions on what the parameters should be in general. For init, it is recommended to use 'pca' with the reason given before. For method, barnes_hut will be faster and gives very similar results if the provided dataset is not highly similar intrinsically. For perplexity, it reflects on the focus in teasing out local and global substructures of the data. n_iter indicates the number of iterations that you will run through the algorithm, and n_components = 2 indicates that the final outcome is a two-dimensional space.
To track the time use for rounds of experiments, we can use the cell magic %%timeit in the Jupyter notebook to track the time needed for a cell to run.