Searching Genes and Proteins for Domains and Motifs

The sequences of genes, proteins, and entire genomes hold clues to their function. Repeated subsequences or sequences with a strong similarity to each other can be clues to things such as evolutionary conservation or functional relatedness. As such, sequence analysis for motifs and domains are core techniques in bioinformatics. Bioconductor contains many useful packages for analyzing genes, proteins, and genomes. In this chapter, you will learn how to use Bioconductor to analyze sequences for features of functional interest, such as de novo DNA motifs and known domains from widely used databases. You'll learn about some packages for kernel-based machine learning to find protein sequence features. You will also learn some large-scale alignment techniques for very many, or very long sequences. You will use Bioconductor and other statistical learning packages.

The following recipes will be covered in this chapter:

  • Finding DNA motifs with universalmotif
  • Finding protein domains using PFAM and bio3d
  • Finding InterPro domains
  • Performing multiple alignments of genes or proteins
  • Aligning genomic length sequences with DECIPHER
  • Machine learning for novel feature detection in proteins
  • 3D structure protein alignment with bio3d
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.109.4