Working with Databases and Remote Data Sources

Large-scale model organism sequencing projects, such as the Human Genome Project (HGP), or the 1,001 plant genomes sequencing projects have made a huge amount of genomics data publicly available. Likewise, open access data sharing by individual laboratories has made the raw sequencing data of genomes and transcriptomes widely available, too. Working with this data programmatically can mean having to parse or bring locally some seriously large or complicated files. As such, much effort has gone into making these resources as accessible as possible through APIs and other queryable interfaces, such as BioMart. In this chapter, we'll look at some recipes that will allow us to search for annotations without having to download whole genome files and find relevant information across databases. We'll look at how to pull raw reads from experiments from within your code and take the opportunity to look at how to apply quality control to this downloaded data.

The following recipes will be covered in this chapter:

Retrieving gene and genome annotations from BioMart
Retrieving and working with SNPs
Getting gene ontology information
Finding experiments and reads from SRA/ENA
Performing quality control and filtering on high-throughput sequence reads
Completing read-to-reference alignment with external programs
Visualizing quality control plots of read-to-reference alignments

Table of Contents for Working with Databases and Remote Data Sources

Create new playlist

Sign In

Sign Up

Table of Contents for
Working with Databases and Remote Data Sources