CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
BLASTp is a set of programs that searches the protein databases using an amino acid sequence as the query. There are four different algorithms, with well‐defined applications in BLASTp: blastp, psi‐blast, phi‐blast and delta‐blast.
To search a homologous protein sequence from the protein database, using the given amino acid sequence as query.
The necessary steps are the same for BLASTp and BLASTn (see Chapter 12: “Basic Local Alignment Search Tool for nucleotide (BLASTn)”), regarding:
Open the URL: http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? PROGRAM = blastp&PAGE_TYPE = BlastSearch&LINK_LOC = blasthome to get the homepage for BLASTp.
These are of the following subtypes. Details of each have been provided in Table 13.1.
TABLE 13.1 Algorithm parameters of BLASTp: Numbered arrows in Figure 13.2 refer to the serial number (SN) of discussion in this table.
SN | Terms | Explanation |
1 | Maximum target sequences | Just like BLASTn, the user can opt (from the drop‐down list) for the highest number of aligned sequences to be displayed in the BLAST result. |
2 | Sort queries | Check this box if BLASTp needs to adjust for short queries (i.e., input “word size” or seed) regarding related parameters to improve results. |
3 | Expect threshold | BLAST alignment may occur due to chance hits to non‐homologous sequences. This threshold value (E‐value) should be lower, to minimize the random matches in the databases. The default value of 10 means that, out of the search results, ten matches could be due to chance. Reducing the Expect Threshold value will reduce the search output. |
4 | Word size | You can opt for either 2 or 3. Since BLAST follows a heuristic algorithm, a seed word of particular length starts finding its match, and then extends in both directions. Taking a larger word size may end up in fewer results, while a shorter word size can lead to more random hits. |
5 | Max matches in a query range | This is a very useful parameter with practical implications. Sometimes, a particular portion of a given query sequence gets a very large number of matches due to high similarity, while the other portion does not get a chance to display the result. This option sets a balance by limiting the occurrence of strong match, and offers an opportunity for the portions with weak matches within the same query sequence. |
6 | Matrix | The user can select between the Percent Accepted Mutations (PAM: used to score alignment between closely related sequences) or Blocks Substitution Matrix (BLOSUM: for evolutionarily divergent sequences), where higher values for matrix indicate greater evolutionary distance in PAM and vice versa in BLOSUM. The BLOSUM62 scoring matrix is a useful all‐square matrix. |
7 | Gap costs | A drop‐down menu displays a given range of gap costs. Increasing gap cost will reduce the number of gaps. It is better to use the default value, unless the results obtained are very irrelevant regarding false positives. |
8 | Compositional adjustments | This matrix is used to adjust or compensate the compositional differences between the sequences being compared. The adjustment thus improves the E‐value of the search. “Conditional compositional score matrix adjustment” is more sophisticated than “Composition‐based statistics”. One can use the default option. |
9 | Filter: low complexity region | Low complexity regions are repeat sequences which could introduce spurious results in BLASTp matching. |
10 | Mask: masks for lookup table only | BLASTp selects the seed word from the look‐up table and then proceeds for an extension. If repeated filter is checked, no seed is found from the low complexity region. |
Mask: mask lower case letter | The lower‐case characters (i.e., bases) indicating low complexity regions are masked and are not considered for BLAST. | |
11 | Upload PSSM | This is a very useful, advanced, but optional tool. One can download a PSSM from PSI‐BLAST or DELTA‐BLAST search. That PSSM can then be uploaded for searching a different database to find out a required homology. |
12 | PSI‐BLAST threshold | The threshold for statistical significance is set for including a protein sequence in the PSSM in the following iteration. The default is 0.005. |
13 | Pseudo count | The default is “0”, which enables BLASTp to determine the pseudo‐count value, based on minimum length description principle. |
The results window is almost the same as previous ones, except for a new column added: “Select for PSI‐blast” (indicated by “E” in Figure 13.4). This column contains checkboxes against each of the PSI‐BLAST results. All the checkboxes are, by default, ticked; however, they can be reversed (unchecked) by unchecking them individually. PSI‐BLAST can then be run for a second iteration by indicating the number of sequences (default is 500) and then clicking on the “Go” button.
The last column, “Used to build PSSM” is checked to indicate the sequences which have been used in the second iteration. The rows harboring the sequences which have not been used in this iteration to build PSSM are highlighted in yellow.
In this example, the amino acid sequence “krpmnafivw srdqrrkmal” has been used as a PHI pattern to run PHI‐BLAST. The specified pattern is indicated by asterisk marks above the alignment.
The output of DELTA‐BLAST (Figure 13.6) is more sensitive and accurate than PSI‐BLAST.
13.59.156.15