Useful Statistical and Machine Learning Methods

In bioinformatics, the statistical analysis of datasets of varied size and composition is a frequent task. R is, of course, a hugely powerful statistical language with abundant options for all sorts of tasks. In this chapter, we will focus a little on some of those useful but not so often discussed methods that, while none of them make up an analysis in and of themselves, can be powerful additions to the analyses that you likely do quite often. We'll look at recipes for simulating datasets and machine learning methods for class prediction and dimensionality reduction.

The following recipes will be covered in this chapter:

Correcting p-values to account for multiple hypotheses
Generating a simulated dataset to represent a background
Learning groupings within data and classifying with kNN
Predicting classes with random forests
Predicting classes with SVM
Learning groups in data without prior information
Identifying the most important variables in data with random forests
Identifying the most important variables in data with PCA

Table of Contents for Useful Statistical and Machine Learning Methods

Create new playlist

Sign In

Sign Up

Table of Contents for
Useful Statistical and Machine Learning Methods