I mentioned that the key to the recipe—in particular, the join in step 6—was to make sure the database contained the right keys, specifically PFAM, to proceed. Depending on the organism and database, the PFAM annotation may not exist. Here's how to check whether it does exist in the database you're interested in with two example databases, org.At.tair.db and org.Eck12.eg.db, an Arabidopsis database:
library(org.At.tair.db)
columns(org.At.tair.db)
and an E.coli database:
library(org.EcK12.eg.db)
columns(org.EcK12.eg.db)
Simply use the columns() function to report the data columns in the database. If PFAM shows up, you can follow the procedure. If it doesn't show up, then as an alternative procedure, it is possible to run PFAM and make the annotations yourself. The following code takes your input protein sequences and runs a PFAM search on the server at EBI using the bio3d function, hmmer(). The returned object contains the PFAM output in a dataframe in the hit.tbl slot:
sequence <- read.fasta(file.path(getwd(), "datasets", "ch3", "ecoli_hsp.fa") ) # run pfamseq on protein result <- hmmer(sequence, type="hmmscan", db="pfam") result$hit.tbl
This will result in the following output:
## name acc bias dcl desc evalue flags hindex ndom nincluded ## 1 GrpE PF01025.19 3.3 272 GrpE 1.4e-46 3 8846 1 1 ## nregions nreported pvalue score taxid pdb.id bitscore mlog.evalue ## 1 1 1 -115.4076 158.2 0 PF01025.19 158.2 105.5824