Learning groups in data without prior information

It is common in bioinformatics to want to classify things into groups without first knowing what or how many groups there may be. This process is usually known as clustering and is a type of unsupervised machine learning. A common place for this approach is in genomics experiments, particularly RNAseq and related expression technologies. In this recipe, we'll start with a large gene expression dataset of around 150 samples, learn how to estimate how many groups of samples there are, and apply a method to cluster them based on the reduction of dimensionality with Principal Component Analysis (PCA), followed by a k-means cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.