Dimensionality reduction

Dimensionality reduction produces new data that captures the most important information contained in the source data. Rather than grouping existing data into clusters, these algorithms transform existing data into a new dataset that uses significantly fewer features or observations to represent the original information.

Algorithms differ with respect to the nature of the new dataset they will produce, as shown in the following list:

Principal component analysis (PCA): Finds the linear transformation that captures most of the variance in the existing dataset
Manifold learning: Identifies a nonlinear transformation that produces a lower-dimensional representation of the data
Autoencoders: Uses neural networks to compress data non-linearly with minimal loss of information

We will dive deeper into linear, non-linear, and neural-network-based unsupervised learning models in several of the following chapters, including important applications of natural language processing (NLP) in the form of topic modeling and Word2vec feature extraction.

Table of Contents for Dimensionality reduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Dimensionality reduction