Glossary (1/2)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

blast2table.pl

Next Chapter

Glossary (2/2)

This is the Title of the Book, eMatter Edition

313

Glossary

1˚

The abbreviation for primary. 1˚ sequence

refers to the letters of DNA, RNA, or pro-

tein. 1˚ transcript refers to an unproc-

essed RNA that still contains its introns.

2˚

The abbreviation for secondary. Most fre-

quently used for generalizing protein and

RNA structures; for example, the α-helix

and hair-pin are common 2˚ structures.

3´

The end of a nucleic acid sequence; often

used with UTR.

5´

The start of a nucleic acid (DNA or RNA)

sequence; often used in conjunction with

UTR (e.g., 5´UTR). Nucleotide sequences

are conventionally written with the 5´ end

at the left. DNA molecules are usually

double-stranded but when written, usu-

ally only the 5´ to 3´ strand is displayed.

The complementary strand has reversed

polarity (3´ to 5´).

The abbreviation for an amino acid that is

often used when describing the length of a

protein (e.g., the average protein is about

300 aa long).

allele

A form of a gene. Typically, the most

common form is called wild-type, and

each allele is given a specific (and often

obscure) name.

amino acid

The basic building block for all proteins.

There are 20 common amino acids.

Arabidopsis thaliana

Known by its common name, thale cress,

this mustard weed is a favorite organism

for plant genetics and molecular biology.

It was the first plant with a complete

genomic sequence. For more information,

see http://www.arabidosis.org.

bit

The contraction for binary digit. The

base-2 logarithm of a number is in units of

bits.

BLOSUM

The abbreviation for a blocks substitution

matrix. Matrix names are followed by a

number (e.g., BLOSUM62) that indicate

the minimum percent identity between

any two aligned sequences.

The abbreviation for base pair. The length

of DNA is usually given in bp or nt, Com-

mon measures include Kb, Mb, and Gb

for thousands, millions, and billions of bp,

respectively.

C-terminus

The end of a protein. In text form, the

C-terminus of the protein is always at the

right.

Caenorhabditis elegans

A nematode (also called a roundworm)

that is about 1 mm long and has about

CDS

This is the Title of the Book, eMatter Edition

314 | Glossary

1,000 cells as an adult. C. elegans was the

first animal to have its complete genome

sequenced. See http://www.wormbase.org.

CDS

The abbreviation for a coding sequence.

CDS isn’t synonymous with exon, since

exons may contain noncoding sequence.

codon

Three contiguous letters of DNA or RNA.

Each of the 64 codons specifies either an

amino acid or a translation stop.

complement

The complement of a DNA sequence is

the sequence on the other strand. For

example, the complement of ACCCGT is

TGGGCA. To complement a sequence in

Perl, use either of the following:

# 4-letter alphabet

$dna =~ tr/ACGT/TGCA/;

# 15-letter alphabet

$dna =~ tr[ACGTRYWSKMBDHV]

[TGCAYRSWMKVHDB];

Drosophila melanogaster

The common fruit fly. This is one of the

most famous organisms for genetic

research and was one of the first animals

whose complete genomic sequence was

determined. See http://www.fruitfly.org.

dynamic programming

A common technique that reduces the

computational complexity of a problem

by finding and extending a partial optimi-

zation.

E. coli

Eschericia coli. A common bacteria nor-

mally found in your gut and a favorite

organism for molecular biology research.

Some variants cause food poisoning.

effective length

Karlin-Altschul statistics assume

sequences of infinite length. To adjust for

edge effects in real sequences, the search

space is reduced by adjusting the true

lengths of the sequences to effective

lengths.

entropy

Randomness; disorder; unpredictability.

eukaryote

Organisms with intracellular membra-

nous organelles such as the nucleus and

mitochondria are called eukaryotes.

frame-shift mutation

A mutation that causes an insertion or

deletion of nucleotides that isn’t a multi-

ple of three, and therefore causes the read-

ing frame to change.

gene

A functional unit of the genome. When

not specifically stated, “gene” is usually

considered a “protein-coding” gene, but

many genes don’t contain the instructions

for proteins (e.g., various RNA genes).

genetic code

The mapping of codons to amino acids.

See Table 2-3.

genetic drift

The tendency of sequences to change over

time by accumulating random mutations.

genome

The complete genetic material for an

organism. For eukaryotes, the genome

refers to the nuclear genome and doesn’t

include organelles.

global alignment

An alignment algorithm that requires

every letter of each sequence to appear in

the alignment. Globally aligning

sequences of different lengths may lead to

very strange alignments.

homologous

In sequence analysis, homologous means

derived from a common ancestor.

Sequences are either homologous or they

aren’t. It is incorrect to say that sequences

are 80 percent homologous unless you

mean that there is an 80 percent chance of

common ancestry. Use percent identity to

describe the similarity of alignments.

hydrophilic

Literally, “likes water.” Water is a polar

molecule that mixes well with other polar

molecules. The charged amino acids K, R,

D, and E, are examples of hydrophilic

amino acids.

PAM

This is the Title of the Book, eMatter Edition

Glossary | 315

hydrophobic

Literally, “fears water.” Nonpolar mole-

cules (like those in oils) don’t mix well

with water. The amino acids L, I, V, and F

are particularly hydrophobic.

Karlin-Altschul

The standard local alignment theory is

often called Karlin-Altschul statistics after

its founding authors.

lambda, λ

The Karlin-Altschul statistical parameter

that converts a raw score to a normalized

score.

local alignment

An alignment algorithm that finds the

optimal subsequence alignment. The

alignment may include all letters of each

sequence, but it isn’t required to do so.

low-complexity sequence

Regions of sequences that are highly pre-

dictable—for example, a region that is 90

percent A or T.

methionine

One of the 20 common amino acids.

Methionine is abbreviated as M or Met,

and is especially important because all

proteins begin with a methionine. There is

only one codon for this amino acid: ATG.

mutation

Any change in sequence to a DNA mole-

cule.

N-terminus

The start of a protein. In text form, a pro-

tein’s N-terminus is always at the left.

nat

Contraction for natural log digits. The

base e logarithm of a number is in units of

nats.

natural selection

A theory founded by Charles Darwin that

explains how organisms change over time

to better fit their environment. It is based

on the principles of variation, heritability,

and differential reproduction.

ncRNA

The abbreviation for noncoding RNA.

Some RNAs, like tRNAs or rRNAs, don’t

contain information for protein

sequences.

Needleman-Wunsch

Global alignment is often called Needle-

man-Wunsch after the authors who first

described the algorithm.

nucleotide

The basic building block of nucleic acid

sequences (DNA and RNA). DNA is made

from A, C, G, or T, while RNA contains

A, C, G, or U.

The abbreviation for nucleotide.

O(n)

The computational complexity of an algo-

rithm is often described by its asymptotic

behavior. O(n) problems grow linearly

with the size of the input. O(log

n) grow

much more slowly, and O(n

) grow much

more quickly.

ORF

Abbreviation for open reading frame.

Each strand of DNA has three frames. Any

subsequence that doesn’t contain stop

codons in a particular frame is an open

reading frame.

ortholog

Genes that are separated by speciation

(i.e., the same gene in different species).

This is often approximated as the best

reciprocal match between two complete

genomes or proteomes.

palindrome

A palindrome in DNA is a sequence that is

read the same on the plus and minus

strands. For example, the sequence

GAATTC is a palindrome. Palindromes

and near-palindromes are often sites for

DNA-protein interaction. Proteins scan-

ning along DNA “see” a palindrome as the

same sequence regardless of which direc-

tion they are moving.

PAM

An acronym for Percent or Point Accepted

Mutation. PAM scoring matrix names are

usually followed by a number (e.g.,

PAM200), which indicates how many iter-

ations of multiplication were used starting

paralogs

This is the Title of the Book, eMatter Edition

316 | Glossary

with the PAM1 matrix. The higher num-

ber indicates a more distant similarity.

paralogs

Genes that are duplicated within a single

genome. Duplication sometimes allows

one of the genes to take on a specialized

function.

phylogenetics

The study of evolutionary relationships

among organisms.

prokaryotes

Organisms that don’t contain intracellu-

lar organelles. All bacteria are prokary-

otes.

proteome

The complete set of all proteins produced

by a particular organism. Many proteins

undergo post-translational modifications

that add or subtract features from a pro-

tein. Therefore, a particular mRNA might

have many different protein isoforms.

pseudogene

A sequence that looks like a gene but isn’t.

Most pseudogenes are derived from

mRNAs that have been reverse-tran-

scribed back to DNA and inserted into the

genome. They have the hallmarks of RNA

processing—notably a poly-A tail and no

introns.

relative entropy

The average number of bits (or nats) per

aligned letter for a given scoring scheme.

repeat

Any class of a sequence that appears mul-

tiple times in a genome. Usually, gene

families aren’t called repeats and the term

is used for junk DNA. Some of the most

common repeats in the human genome

include the ALU and LINE families.

reverse transcriptase

A protein that creates DNA from an RNA

template.

RNA

Ribonucleic acid. RNA is chemically simi-

lar to DNA but not used strictly for stor-

age. Many RNA molecules have important

functions in the cell and may even have

enzymatic properties. Some of the most

common functional RNA molecules

include rRNAs and tRNAs.

RNA polymerase

A protein or multiprotein complex that

creates RNA from a DNA template.

ribosome

A complex macromolecule made up of

proteins and rRNAs. Ribosomes are

responsible for translating mRNAs into

proteins.

rRNA

Ribosomal RNA. The ribosome is com-

posed of many specific RNA molecules,

and these components are called rRNAs.

rRNAs are some of the most abundant

RNAs in a cell.

Smith-Waterman

Local alignment is often referred to as

Smith-Waterman, after the authors who

first described the algorithm.

start codon

ATG. Codes for the amino acid methion-

ine. Many proteins have N-terminal

post-translational modifications, and the

first amino acid of the mature protein may

therefore not be methionine.

stop codon

TAA, TGA, and TAG are the three codons

that terminate translation.

sum statistics

A method that determines the aggregate

statistical significance of multiple local

alignments.

target frequency

The expected frequencies of individual let-

ter pairings. For nucleotide scoring matri-

ces, the target frequency is often

summarized by the expected percent iden-

tity in sequences with unbiased composi-

tion.

transcriptome

The complete set of transcripts for a par-

ticular genome. This term is often used to

mean the mRNAs of protein coding genes

and their alternatively spliced variants.

UTR

This is the Title of the Book, eMatter Edition

Glossary | 317

tRNA

The abbreviation for transfer RNA. tRNAs

transfer individual amino acids to the

ribosome. Each tRNA molecule has an

anti-codon the matches the reverse-com-

plement of the amino acid it carries.

UTR

The abbreviation for an untranslated

region. The 5´ and 3´ ends of an mRNA

have untranslated regions. These regions

sometimes play regulatory roles that

change the mRNA’s stability, translatabil-

ity, or localization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Glossary (1/2)

Create new playlist

Sign In

Sign Up

Table of Contents for
Glossary (1/2)