Dimensionality reduction

Feature reduction (or feature selection) or dimensionality reduction is the process of reducing the input set of independent variables to obtain a lesser number of variables that are really required by the model to predict the target.

In certain cases, it is possible to represent multiple dependent variables by combining them together without losing much information. For example, instead of having two independent variables such as the length of a rectangle and the breath of a rectangle, the dimensions can be represented by only one variable called the area that represents both the length and breadth of the rectangle.

The following mentioned are the multiple reasons we need to perform a dimensionality reduction on a given input dataset:

  • To aid data compression, therefore accommodate the data in a smaller amount of disk space.
  • The time to process the data is reduced as fewer dimensions are used to represent the data.
  • It removes redundant features from datasets. Redundant features are typically known as multicollinearity in data.
  • Reducing the data to fewer dimensions helps visualize the data through graphs and charts.
  • Dimensionality reduction removes noisy features from the dataset which, in turn, improves the model performance.

There are many ways by which dimensionality reduction can be attained in a dataset. The use of filters, such as information gain filters, and symmetric attribute evaluation filters, is one way. Genetic-algorithm-based selection and principal component analysis (PCA) are other popular techniques used to achieve dimensionality reduction. Hybrid methods do exist to attain feature selection.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.42.243