PAM
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Glossary | 315
hydrophobic
Literally, “fears water.” Nonpolar mole-
cules (like those in oils) don’t mix well
with water. The amino acids L, I, V, and F
are particularly hydrophobic.
Karlin-Altschul
The standard local alignment theory is
often called Karlin-Altschul statistics after
its founding authors.
lambda, λ
The Karlin-Altschul statistical parameter
that converts a raw score to a normalized
score.
local alignment
An alignment algorithm that finds the
optimal subsequence alignment. The
alignment may include all letters of each
sequence, but it isn’t required to do so.
low-complexity sequence
Regions of sequences that are highly pre-
dictable—for example, a region that is 90
percent A or T.
methionine
One of the 20 common amino acids.
Methionine is abbreviated as M or Met,
and is especially important because all
proteins begin with a methionine. There is
only one codon for this amino acid: ATG.
mutation
Any change in sequence to a DNA mole-
cule.
N-terminus
The start of a protein. In text form, a pro-
tein’s N-terminus is always at the left.
nat
Contraction for natural log digits. The
base e logarithm of a number is in units of
nats.
natural selection
A theory founded by Charles Darwin that
explains how organisms change over time
to better fit their environment. It is based
on the principles of variation, heritability,
and differential reproduction.
ncRNA
The abbreviation for noncoding RNA.
Some RNAs, like tRNAs or rRNAs, don’t
contain information for protein
sequences.
Needleman-Wunsch
Global alignment is often called Needle-
man-Wunsch after the authors who first
described the algorithm.
nucleotide
The basic building block of nucleic acid
sequences (DNA and RNA). DNA is made
from A, C, G, or T, while RNA contains
A, C, G, or U.
nt
The abbreviation for nucleotide.
O(n)
The computational complexity of an algo-
rithm is often described by its asymptotic
behavior. O(n) problems grow linearly
with the size of the input. O(log
2
n) grow
much more slowly, and O(n
2
) grow much
more quickly.
ORF
Abbreviation for open reading frame.
Each strand of DNA has three frames. Any
subsequence that doesn’t contain stop
codons in a particular frame is an open
reading frame.
ortholog
Genes that are separated by speciation
(i.e., the same gene in different species).
This is often approximated as the best
reciprocal match between two complete
genomes or proteomes.
palindrome
A palindrome in DNA is a sequence that is
read the same on the plus and minus
strands. For example, the sequence
GAATTC is a palindrome. Palindromes
and near-palindromes are often sites for
DNA-protein interaction. Proteins scan-
ning along DNA “see” a palindrome as the
same sequence regardless of which direc-
tion they are moving.
PAM
An acronym for Percent or Point Accepted
Mutation. PAM scoring matrix names are
usually followed by a number (e.g.,
PAM200), which indicates how many iter-
ations of multiplication were used starting