There's more...

I mentioned that the key to the recipe—in particular, the join in step 6—was to make sure the database contained the right keys, specifically PFAM, to proceed. Depending on the organism and database, the PFAM annotation may not exist. Here's how to check whether it does exist in the database you're interested in with two example databases, org.At.tair.db and org.Eck12.eg.db, an Arabidopsis database:

library(org.At.tair.db)
columns(org.At.tair.db)

and an E.coli database:

library(org.EcK12.eg.db)
columns(org.EcK12.eg.db)

Simply use the columns() function to report the data columns in the database. If PFAM shows up, you can follow the procedure. If it doesn't show up, then as an alternative procedure, it is possible to run PFAM and make the annotations yourself. The following code takes your input protein sequences and runs a PFAM search on the server at EBI using the bio3d function, hmmer(). The returned object contains the PFAM output in a dataframe in the hit.tbl slot:

sequence <- read.fasta(file.path(getwd(), "datasets", "ch3", "ecoli_hsp.fa") )
# run pfamseq on protein
result <- hmmer(sequence, type="hmmscan", db="pfam")
result$hit.tbl

This will result in the following output:

##   name        acc bias dcl desc  evalue flags hindex ndom nincluded
## 1 GrpE PF01025.19  3.3 272 GrpE 1.4e-46     3   8846    1         1
##   nregions nreported    pvalue score taxid     pdb.id bitscore mlog.evalue
## 1        1         1 -115.4076 158.2     0 PF01025.19    158.2    105.5824

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...