The sample data you'll need is available from this book's GitHub repository: https://github.com/danmaclean/R_Bioinformatics_Cookbook. If you want to use the code examples as they are written, then you will need to make sure that this data is in a sub-directory of whatever your working directory is.
The following are the R packages that you'll need. Most of these will install with install.packages(); others are a little more complicated:
- ape
- Bioconductor:
- Biostrings
- biomaRt
- DECIPHER
- EnsDb.Rnorvegicus.v79
- kebabs
- msa
- org.At.tair.db
- org.Eck12.db
- org.Hs.eg.db
- PFAM.db
- universalmotif
- bio3d
- dplyr
- e1071
- seqinr
Bioconductor is huge and has its own installation manager. You can install it with the following code:
if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install()
Normally, in R, a user will load a library and use the functions directly by name. This is great in interactive sessions but it can cause confusion when many packages are loaded. To clarify which package and function I'm using at a given moment, I will occasionally use the packageName::functionName() convention.
letters[1:5]This will give us output as follows – note that the output lines are prefixed with ##:
## a b c d eSome of the packages that we want to use in this chapter rely on third-party software that must be installed separately. A great way of installing and managing bioinformatics software on any of Windows, Linux, or macOS is the conda package manager in conjunction with the bioconda package channel. You can install lots of software with some simple commands. To install both, start out by reading the current instructions at https://bioconda.github.io/.