PCA with sklearn

The sklearn.decomposition.PCA implementation follows the standard API based on the fit() and transform() methods, which compute the desired number of principal components and project the data into the component space, respectively. The convenience method fit_transform() accomplishes this in a single step.

PCA offers three different algorithms that can be specified using the svd_solver parameter:

  • Full computes the exact SVD using the LAPACK solver provided by SciPy
  • Arpack runs a truncated version suitable for computing less than the full number of components
  • Randomized uses a sampling-based algorithm that is more efficient when the dataset has more than 500 observations and features, and the goal is to compute less than 80% of the components
  • Auto uses randomized where most efficient, otherwise, it uses the full SVD

See references on GitHub for algorithmic implementation details.

Other key configuration parameters of the PCA object are as follows:

  • n_components: These compute all principal components by passing None (the default), or limit the number to int. For svd_solver=full, there are two additional options: a float in the interval [0, 1] computes the number of components required to retain the corresponding share of the variance in the data, and the mle option estimates the number of dimensions using maximum likelihood.
  • whiten: If True, it standardizes the component vectors to unit variance that, in some cases, can be useful for use in a predictive model (the default is False).

To compute the first two principal components of the three-dimensional ellipsis and project the data into the new space, use fit_transform() as follows:

pca = PCA(n_components=2)
projected_data = pca.fit_transform(data)
projected_data.shape
(100, 2)

The explained variance of the first two components is very close to 100%:

pca2.explained_variance_ratio_
array([0.77381099, 0.22385721])

The screenshot at the beginning of this section shows the projection of the data into the new two-dimensional space.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.229.253