This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
128
|
Chapter 8: 20 Tips to Improve Your BLAST Searches
8.17 Perform Pilot Experiments
Before embarking on a large BLAST experiment, first try some pilot experiments. For
example, if you want to compare all human proteins to all nonhuman proteins, try
100 proteins first. Or, if you want to annotate a 5 mb chromosomal region with
BLASTX similarities, search 100 Kb first. If you’re unsure of which parameters to
use, try several and see which ones give you the kinds of results you’re looking for. It
may seem like a waste of time, but performing pilot experiments will actually save
you time in the end.
8.18 Examine Statistical Outliers
In a high-throughput setting, BLAST reports may be huge and number in the thou-
sands. There’s no way you can look at all of them, but for quality control, you
should examine some of them. Keep global statistics on BLAST reports, such as
number of hits per Kb. Statistical outliers may point to general problems that
become more apparent in certain sequences.
8.19 Use links and topcomboN to Make Sense of
Alignment Groups
WU-BLAST has two very useful parameters for displaying alignment groupings.
topcomboN sorts alignments into groups and labels them. The links parameter shows
the order of alignments in a group, which is much like the order of a gene’s exons.
Figure 8-9 displays these features.
8.20 How to Lie with BLAST Statistics
Several techniques can help you massage BLAST statistics to either hide significant
alignments or make meaningless alignments appear highly significant. Why would
you want to do this? If you have to ask, you’re not the intended audience. Dishonest
evil doers read on.
The easiest method to adjust the significance of all scores is to set the effective size of
the search space either higher or lower. Command-line parameters in both NCBI-
BLAST (
-Y) and WU-BLAST (Y and Z) are available. You can also alter the scoring
scheme by editing the scoring matrices. A more involved approach involves hacking
the source code to set your own values for λ, k, and H. WU-BLAST makes it all too
easy because you can alter scores or set Karlin-Altschul parameters on the command
line. Whatever approach you take, you will, of course, want to edit the footer to
cover your tracks. The easiest way to do this is to run the search twice and diff the
footers to determine what needs fixing.