Working with Databases and Remote Data Sources

Large-scale model organism sequencing projects, such as the Human Genome Project (HGP), or the 1,001 plant genomes sequencing projects have made a huge amount of genomics data publicly available. Likewise, open access data sharing by individual laboratories has made the raw sequencing data of genomes and transcriptomes widely available, too. Working with this data programmatically can mean having to parse or bring locally some seriously large or complicated files. As such, much effort has gone into making these resources as accessible as possible through APIs and other queryable interfaces, such as BioMart. In this chapter, we'll look at some recipes that will allow us to search for annotations without having to download whole genome files and find relevant information across databases. We'll look at how to pull raw reads from experiments from within your code and take the opportunity to look at how to apply quality control to this downloaded data.

The following recipes will be covered in this chapter:

  • Retrieving gene and genome annotations from BioMart
  • Retrieving and working with SNPs
  • Getting gene ontology information
  • Finding experiments and reads from SRA/ENA
  • Performing quality control and filtering on high-throughput sequence reads
  • Completing read-to-reference alignment with external programs
  • Visualizing quality control plots of read-to-reference alignments

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.152.183