Unsupervised Learning

In Chapter 6, Machine Learning Process, we discussed how unsupervised learning adds value by uncovering structures in the data without an outcome variable, such as a teacher, to guide the search process. This task contrasts with the setting for supervised learning that we focused on in the last several chapters.

Unsupervised learning algorithms can be useful when a dataset contains only features and no measurement of the outcome, or when we want to extract information independent of the outcome. Instead of predicting future outcomes, the goal is to study an informative representation of the data that is useful for solving another task, including the exploration of a dataset.

Examples include identifying topics to summarize documents (see Chapter 14, Topic Modeling), reducing the number of features to reduce the risk of overfitting and the computational cost for supervised learning, or grouping similar observations, as illustrated by the use of clustering for asset allocation at the end of this chapter.

Dimensionality reduction and clustering are the main tasks for unsupervised learning:

  • Dimensionality reduction transforms the existing features into a new, smaller set, while minimizing the loss of information. A broad range of algorithms exists that differ only in how they measure the loss of information, whether they apply linear or non-linear transformations, or the constraints they impose on the new feature set.
  • Clustering algorithms identify and group similar observations or features instead of identifying new features. Algorithms differ in how they define the similarity of observations and their assumptions about the resulting groups.

More specifically, this chapter covers the following:

  • How Principal Component Analysis (PCAand Independent Component Analysis (ICA) perform linear dimensionality reduction
  • How to apply PCA to identify risk factors and eigen portfolios from asset returns
  • How to use non-linear manifold learning to summarize high-dimensional data for effective visualization
  • How to use t-SNE and UMAP to explore high-dimensional alternative image data
  • How k-Means, hierarchical, and density-based clustering algorithms work
  • How to use agglomerative clustering to build robust portfolios according to hierarchical risk parity
The code samples for each section are in the directory of the online GitHub repository for this chapter at https://github.com/PacktPublishing/Hands-On-Machine-Learning-for-Algorithmic-Trading.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.250.114