Dimensionality reduction

Dimensionality reduction produces new data that captures the most important information contained in the source data. Rather than grouping existing data into clusters, these algorithms transform existing data into a new dataset that uses significantly fewer features or observations to represent the original information.

Algorithms differ with respect to the nature of the new dataset they will produce, as shown in the following list:

  • Principal component analysis (PCA): Finds the linear transformation that captures most of the variance in the existing dataset
  • Manifold learning: Identifies a nonlinear transformation that produces a lower-dimensional representation of the data
  • Autoencoders: Uses neural networks to compress data non-linearly with minimal loss of information

We will dive deeper into linear, non-linear, and neural-network-based unsupervised learning models in several of the following chapters, including important applications of natural language processing (NLP) in the form of topic modeling and Word2vec feature extraction.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.251.22