Useful Statistical and Machine Learning Methods

In bioinformatics, the statistical analysis of datasets of varied size and composition is a frequent task. R is, of course, a hugely powerful statistical language with abundant options for all sorts of tasks. In this chapter, we will focus a little on some of those useful but not so often discussed methods that, while none of them make up an analysis in and of themselves, can be powerful additions to the analyses that you likely do quite often. We'll look at recipes for simulating datasets and machine learning methods for class prediction and dimensionality reduction.

The following recipes will be covered in this chapter:

  • Correcting p-values to account for multiple hypotheses
  • Generating a simulated dataset to represent a background
  • Learning groupings within data and classifying with kNN
  • Predicting classes with random forests
  • Predicting classes with SVM
  • Learning groups in data without prior information
  • Identifying the most important variables in data with random forests
  • Identifying the most important variables in data with PCA
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.42.158