This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
114
|
Chapter 7: A BLAST Statistics Tutorial
melanogaster genome. On the other hand, it appears that looking for short—less
than 15 base-pair—cis-regulatory elements using either version of BLASTN with the
default parameters is unlikely to be successful.
So what was the unreported WU-BLASTN Expect? Let’s calculate it. With the data
in Table 7-3 and the previously calculated effective HSP length of 294, first calculate
and using the Perl functions
effectiveLengthSeq and effectiveLengthDB. Plug-
ging and together with the WU-BLASTN λ and k and a raw score of 125 into
the
rawScoreToExpect function gives an Expect of 281. Recall that the NCBI-BLASTN
Expect was 1e
-6
. That’s a 281-million-fold difference. BLAST is clearly parameter-
sensitive! Using the default parameters, you instructed NCBI-BLASTN to search for
short highly conserved regions, and it found one. WU-BLASTN, on the other hand,
is parameterized to look for large regions of relatively low percent identity. This
would be fine for cross-species searches of poorly conserved exons but is inappropri-
ate for finding oligos.
Using BLAST intelligently requires using the correct parameters for the task at hand
and not placing too much faith in the reported Expect. See the section on BLAST
protocols in Chapter 9 for practical suggestions on BLAST parameter choice.
Remember, you get what you look for.
What It All Means
You now know how bit scores, sum scores, Expects, and P-values are calculated.
You’ve also seen first-hand that scoring matrices and target frequencies aren’t merely
theoretical abstractions but realities that determine the outcome of a BLAST search.
In some ways, choosing the right scoring scheme for a BLAST search is like choosing
the right pair of eyeglasses. If your scoring scheme is too stringent, BLAST becomes
nearsighted and will miss distant homologies. If your scheme is too lenient, BLAST
becomes farsighted and fails to detect the obvious. Unfortunately, there’s no optimal
scoring scheme. As in real life, sometimes the best you can do is put on bifocals.
You’ve also seen that searching the same sequence and database with varied parame-
ters can result in different alignments having very different Expects. Scores and E-val-
ues aren’t implicit in a sequence or an alignment; they are solely contingent upon
parameter values and the methods used to assess significance. There is nothing abso-
lute about a BLAST significance value; it merely denotes the significance of an align-
ment in the context of a given search. Like everything else in bioinformatics, the
biological implications of a (significant) alignment are inferred by the user and
should be tested experimentally, if possible.
Hopefully, you’ve also learned that there is more to Karlin-Altschul statistics than
simply calculating an Expect for an alignment. Karlin-Altschul statistics provide a
theoretical framework from which to interpret alignment scores in the context of
parameter choice. They also give you the means to tune BLAST for specific purposes.
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Where Did My Oligo Go?
|
115
Without them, you’d have no way of knowing what a given scoring scheme was
looking for, and you’d cast around in the dark for the right set of parameters. Karlin-
Altschul statistics remove the mystery from parameter choice. BLAST certainly has
its limitations, but thanks to its statistical foundation, at least you know what you’re
looking for.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.181.47