This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
334 | Index
query sequence
BLASTN, 91
BLASTP, 91
blastpgp parameter, 260
BLASTX, 93
effective length, 105
example, 5, 9
filtering, 120, 263, 271, 275
megablast parameter, 247, 248
quotes ("), 231
R
R= parameter (WU-BLAST), 279
random coil, 25
raw score
converting, 97
converting to normalized score, 64
defined, 59, 100
Expect and, 104
expected score, 62
HSPs and, 104
Karlin-Altschul, 101
lambda and, 61, 111
ungapped alignments, 113
rawScoreToBitScore function, 102
rawScoreToExpect function, 102
reading frame
BLASTP, 91
BLASTX format, 93
overlapping, 108
translating DNA, 27
(see also ORF)
reclustering, 266
redundancy, 185, 196, 197
RefSeq
features, 204
GenBank and, 197
NCBI toolbox, 4
nucleotide sequences, 5
regulation, splicing and, 138
relative entropy, 62, 67
RepeatMasker, 143
repeats
alignments and, 133
dinucleotide, 37
genomic DNA and, 121, 137, 148, 156
genomic sequences and, 153
long-running queries and, 120
mapping DNA/EST to genomes, 135
masking, 143
overlapping, 138, 150
overview, 37
RepeatMasker, 143
transcript clustering, 140
repetitive elements
contamination and, 144
duplication and, 38
eurkaryotic genomes, 143
exons and, 138
false associations, 141
FEATURES and, 203
genes and, 135
genomes and, 121
reports
BLAST structure, 88
footer, 11, 88, 99, 119
formats, 88–95
formatting considerations, 208
Request Identifier (RID), 8
restest parameter (WU-BLAST), 279
results
choosing format, 7, 8
viewing, 8–12
retro-pseudogenes, 38
retrotransposons, 37
retroviruses, 37
reverse transcriptase, 36
ribosomal RNA (rRNA), 22, 37
ribosomes, 25
RID (Request Identifier), 8
RNA
noncoding RNA, 22, 25, 37
ribosomal RNA, 22, 37
transcription, 22
transfer RNA, 22, 37
viruses and, 35
(see also mRNA)
RNA polymerase, 22
rounding errors, 61, 108
rpsblast (NCBI-BLAST), 162, 256
rRNA (ribosomal RNA), 22, 37
S
S= parameter (WU-BLAST), 279
Saccharomyces cereviseae, 33
Saccharomyces Genome Database, 199
same-sense mutations, 28
saturated sequences, 31, 32
scaling factors, 61
scientific experiments
controls and, 117
designing, 119
pilots and, 128
searches as, 116
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Index | 335
scoring matrix
assumptions, 65
bl2seq, 254
blasta and, 169
blastall, 234
BLASTP, 90, 91, 145
blastpgp, 261
BLASTX, 148
editing, 186, 187
expected score, 62
footer reports, 88
global alignment, 41, 42
insensitive search, 146
local alignment, 46
massaging statistics, 128
overview, 59, 60
PSI-BLAST, 256
relative entropy, 62
RepeatMasker, 143
report differences, 113
target frequencies, 60
troubleshooting, 120
WU-BLAST, 276
scoring schemes
BLAST parameters and, 111
mapping oligo/genomes, 132
massaging statistics, 128
NCBI-BLAST, 302
nucleotide, 299
troubleshooting and, 120
SCSI interface, 218
search space
alignment, 65, 119
bl2seq parameter, 255
blastall parameter, 239
BLASTP, 91
blastpgp parameter, 264
cross-species sequence exploration, 136
defined, 52
depicted, 76
HSP and, 67
Karlin-Altschul statistics, 66, 98, 119
mapping oligo/genomes, 132
massaging statistics, 128
searches
BLAST, 130
BLASTP, 144, 145
choosing parameters, 6
controls and, 117
restricting, 52, 195
as scientific experiments, 116
selecting database for, 5
submitting, 8
2˚ (secondary) structure, 25
seeding
BLAST searches, 222
BLASTN searches, 224
BLASTX searches, 151
cross-species exploration, 135
insensitive search, 146
mapping proteins/genomes, 153
sensitivity and, 136
short exons, 134
soft masking and, 233
troubleshooting, 120
word size and, 124
WU-BLAST and, 151, 277
seedp option (PHI-BLAST), 178, 262
seedtop (NCBI-BLAST), 163
selfish DNA, 37
sensitive searches, 146, 147, 222
sensitivity
seeding and, 136
specificity and, 131
speed and, 223
word size and, 139
seqtest parameter (WU-BLAST), 279
sequence alignment
BLAST report, 9
global alignment, 40–46
local alignment, 46–50
sequence databases
BLAST and, 198–206
management strategies, 206–212
sequence lines (FASTA format), 188
sequence management software, 211, 212
sequence similarity
amino acid similarity, 57–59
BLAST and, 3, 19
cross-species exploration, 136
determining, 64
information theory, 55–57
Karl-Altschul statistics, 65–67
scoring matrices, 59, 60
sum statistics, 67–70
target frequencies, 60–64
sequences
classifying by, 33
exploring and, 130
FASTA format and, 188
masking, 139
molecular clocks and, 32
search space between, 76
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
336 | Index
sequences (continued)
transcripts for proteins, 36
xdget, 184
serial searches
accurate alignment, 152
BLAST and, 222–224
BLASTX and, 149
long sequences, 153
undocumented genes, 157
serine (S), 24
setdb (WU-BLAST), 168, 180
SGD database, 204
SGE distributed resource management, 219
Shannon, Claude, 56
Shannon’s Entropy, 56
shotgun sequence, 141, 149, 151
silent mutations, 28
SIM4 program, 135
similarity
biological sequences and, 38
patterns of, 27
vectors, 142
weak, 117
simple repeats, 37
Smith-Waterman algorithm
alignment endpoints, 81
BLAST statistics compared, 102
blastpgp parameter, 263
gold standard, 76
(see also local alignment)
soft masking
BLASTP searches, 145
case sensitivity and, 133
cross-species exploration, 135
functionality, 120
low-complexity and, 150
seeding and, 81, 233
sensitive searches, 146
software
optimization, 220–224
sequence management, 211, 212
source code (NCBI-BLAST), 161, 224
source, message and symbols from, 56
SOURCE (sequence record), 203
sp2fasta (WU-BLAST file), 168
span parameter (WU-BLAST), 279
span1 parameter (WU-BLAST), 279
span2 parameter (WU-BLAST), 279
species, 134
specificity, 131, 133, 147
SPIDEY program, 135
splicing
alternative, 157
coding sequences and, 36
regulation and, 138
transcript clustering, 140
SRS (Sequence Retrieval System), 211
stacking, 139
start codons, 26
statistics
massaging, 128, 129
redundant sequences and, 196
statistical outliers, 128
(see also specific methods)
Stein, Lincoln, 305
stop codons
alignment and, 124, 158, 184
BLASTX, 149
defined, 26
mutation and, 28
preventing, 155
prokaryotic genes, 36
pseudogenes, 38, 124, 153
TBLASTX search, 175
storage considerations, 207
strandedness
BLAST, 92
BLASTX, 93
TBLASTX, 93
sts database, 197
Stuffit Expander, 166
substitution matrix (see scoring matrix)
substitution, nonconservative, 28
sum score
alignments and, 102
calculating, 106
Expect and, 103, 104
overview, 67–70, 104
sumScore function, 106
suppressors, 36
SWISS-PROT database, 197, 204
symbolic links, 163
symbols, message and, 56
synonymous mutations, 28
synteny, 136
T
T= parameter (WU-BLAST), 280
tabular format
blast-imager.pl, 305
BLASTN and, 133
converting to, 118, 309
megablast and, 172
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Index | 337
NCBI and, 295, 296
parsing, 132
TAIR (The Arabidopsis Information
Resource), 204
tar command, 162, 167
tarball (tape archive), 162, 167
target frequency (TF)
cross-species, 119
lambda, 61, 62
match-mismatch scoring, 63, 64, 131
nucleotide scoring schemes, 299
overview, 60–64
relative entropy, 62
report differences, 113
taxonomic classification, 33, 202, 203
Taxonomy browser (NCBI toolbox), 4
TBLASTN
alignment, 93
BLAST program, 75
display formats, 291–298
features, 174, 183
searches, 152–155
strandedness, 92
ungapped alignment, 124
TBLASTX
alignment, 93
BLAST program, 75, 76
cross-species exploration, 136
display formats, 291–298
gapped alignment and, 234
genomic DNA/ESTs, 139
NCBI-BLAST features, 174, 175
searches, 155–158
strandedness, 92
ungapped alignment, 124
WU-BLAST features, 183, 184
tee program (Unix), 309
templates, discontiguous, 249, 251
Tera-BLAST, 225, 226
T/F (true/false) switches, 229
threonine (T), 24
thymine (T), 20
TimeLogic, 225, 226
top parameter (WU-BLAST), 280
topcomboN parameter (WU-BLAST), 94,
128, 280
trace-back, 44–46
transcripts
aligning to genome, 53
BLASTX searches, 147, 157, 158
clustering and extension, 139, 140
ESTs, 137, 197
low-complexity regions, 138
mapping between species, 134
transfer RNA (tRNA), 22, 37
translation
BLASTX and, 108
inferences from, 123
to protein sequence, 25
reading frames, 27
transposons, 37
TrEMBL database, 204
tRNA (transfer RNA), 22, 37
tryptophan (W), 24, 58
tutorials
NCBI-BLAST, 170–180
WU-BLAST, 180–186
twilight zone, 117
two-hit algorithm
blastall parameter, 230, 237
BLASTX search, 151
defined, 79
insensitive search, 146
megablast parameter, 246
WU-BLAST parameter, 274
tyrosine (Y), 24, 58
U
undercalling, 142, 143
ungapped alignments
depicted, 76
finding, 86
HSPs and, 84
reporting lengths, 113
usage, 124
UniGene database, 4, 9, 204
Unix environment
dashes, 194
DRMs, 219
NCBI-BLAST installation, 162–164
WU-BLAST installation, 166–170
unordered-sum score, 68
untranslated regions (UTRs), 36, 123, 155
uracil (U), 20, 22
UTRs (untranslated regions), 36, 123, 155
V
V= parameter (WU-BLAST), 120, 280
valine (V), 24, 58
variables (see environment variables)
variation, selection and, 29, 30, 32
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
338 | Index
vectors
clipping, 141, 142
cloning, 124, 141
sequences and, 150, 197
VelocityEngine, 224
VERSION field (flat file), 201
version number
sequence records and, 203
xdformat parameter, 282
xdget parameter, 286
vertical gap score, 42, 43
viral DNA, 124
virtual databases
blastall parameter, 231
features, 194
WU-BLAST and, 194
xdget support, 208
viruses
biology and, 35
noncoding sequence in, 35
W
W= parameter (WU-BLAST), 281
warnings parameter (WU-BLAST), 280
whitespace, 188, 189, 191
Windows environment, 164, 219
WINK parameter (WU-BLAST), 139, 151,
280
Woese, Carl, 33
word, 77, 79
word hits
defined, 77
example, 77
two-hit algorithm and, 79
word size and, 79
word size
bl2seq parameter, 255
blastall parameter, 239
blastclust parameter, 266
blastpgp parameter, 264
cross-species exploration, 131, 136
defined, 79
footer reports, 88
genomic DNA/ESTs, 139
mapping genomes, 132
megablast parameter, 251
NCBI-BLAST, 151, 267
raw sequencing, 124
RepeatMasker and, 143
transcript clustering, 139
vector clipping and, 142
WU-BLAST, 267, 281
wordmask= parameter (WU-BLAST), 281
WormBase database, 199, 204
WU-BLAST
alignment, 93, 124
alignment groups, 94, 128
alignment threshold, 120
annotating ESTs, 150
BLASTP searches, 145
BLASTX searches, 148, 152
command-line tutorial, 180–186
converting, 309
cross-species exploration, 136
database splitting, 222
frames, 108
genomic DNA/ESTs, 137
implementation, 81, 83, 86, 87
increasing processors, 216
insensitive search, 146
installation, 166–170
large databases, 194
licensing, 268
mapping, 132–134, 153
masking repeats, 143
massaging statistics, 128
mining ESTs, 154
NCBI identifier format, 191
NCBI-BLAST differences, 267
parameters, 116, 269–281
popularity of, 3
P-value, 102
query chopping, 221
query coordinates, 91
removing redundancy, 196
report format, 88, 91
scoring matrix names, 187
sensitive searches, 147
serial searching and, 223
statistical significance, 70
sum statistics, 102
TBLASTX searches, 156, 158
transcript clustering, 140
vector clipping, 142
word size and, 151
xdformat, 281–285
xdget, 208, 285–288
WU-BLASTN, 111
WU-BLASTX, 108
X
X= parameter (WU-BLAST), 281
XBLAST, 76
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.156.202