This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
256
|
Chapter 13: NCBI-BLAST Reference
blastpgp Parameters (PSI-BLAST
and PHI-BLAST)
blastpgp is the program used to run PSI-BLAST and PHI-BLAST. These programs are
specialized protein BLAST comparisons that are more sensitive than the standard
BLASTP search. PSI-BLAST considers position-specific information when searching
for significant hits. PHI-BLAST uses a pattern, or profile, to seed an alignment,
which is then extended by the normal BLASTP algorithm.
PSI-BLAST
PSI-BLAST (position-specific iterated BLAST) uses a specialized scoring matrix that
assigns scores to each position (hence, position-specific) in the query sequence based
on alignments defined by consecutive iterations of searches (hence, iterated). The
specialized matrix is a position-specific scoring matrix (PSSM) that assigns a score for
every amino acid at each position in the query sequence (See Figure 13-1).
Figure 13-1 shows a portion of a PSSM calculated for the coelacanth Hoxa11 protein
(AAG39070). The query amino acids are numbered in the left column with the posi-
tion-specific scores for each of the 20 amino acids shown across each row. The
diverse scores of the three Tyrosines (Y) at positions 1, 7, and 8 highlight the posi-
tion-specific aspect of this scoring scheme compared to traditional BLAST matrices,
which would contain the same scores for Y in all three positions.
The PSSM, or checkpoint file, is created internally by PSI-BLAST, but it can also be
exported to a file using the
-C option of blastpgp. This option is extremely useful.
You can use the checkpoint file in subsequent PSI-BLAST (blastpgp) searches or as a
database entry for the RPS-BLAST program. You can also use the PSSM in a special-
ized tblastn search in blastall by using the -
p psitblastn and -R <checkpoint file>
options with a nucleotide database.
Figure 13-1. PSSM for the first 10 amino acids of the coelacanth HoxA11 protein
A R N D C Q E G H I L K M F P S T W Y V
1 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1
2 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1
3 P -1 -2 -2 -2 -3 -2 -1 -2 -2 -3 -3 -1 -3 -4 8 -1 -1 -4 -3 -3
4 S 1 -1 0 -1 -1 0 0 -1 -1 -3 -3 0 -2 -3 -1 5 1 -3 -2 -2
5 C -1 -4 -3 -4 9 -3 -4 -3 -3 -2 -2 -3 -2 -3 -3 -1 -1 -3 -3 -1
6 T 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 3 -3 -2 -2
7 Y -2 -3 -3 -4 -3 -2 -3 -4 1 -1 -1 -3 -1 5 -4 -2 -2 1 7 -2
8 Y -1 -1 -1 -1 -2 0 -1 -2 6 -2 -1 -1 -1 1 -1 -1 -1 0 5 -2
9 V -1 -2 -2 -2 -1 -2 -2 -2 -2 1 2 -2 0 -1 -2 -2 -1 -2 -1 4
10 S -1 -1 -1 -1 -3 3 3 -2 -1 -2 1 0 -1 -2 -2 2 -1 -3 -2 -2