Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Glossary

Term	Meaning
ab initio	A Latin term that means starting “from the beginning” or initiation.
Accession number	The unique number assigned to an accepted submission (e.g., molecular sequence, genome project data, WGS, STs, etc.) by the database (NCBI, DDBJ, EMBL) to differentiate the submission from another similar type. The accession number is alphanumeric, and the format differs among molecular sequences (nucleotide and protein) as well as the type of database (NCBI, Swiss‐Prot‐UniProt, etc.)
Adapter	Priming site created by ligation of short oligonucleotide to the DNA which is to be sequenced or amplified
Algorithm	A set of rules set to complete an assignment or operation by a computer (in general). The term is derived from the name of Iraqi mathematician Mohammed ibn Musa al‐Khwarizmi (9th century AD).
Allosteric Protein	A protein having multiple ligand binding sites, whose conformation changes upon ligand binding. The enzyme can be an allosteric protein.
Amplicon	Gene‐specific nucleotides sequences amplified by PCR.
Annealing Temperature (T_m)	The temperature at which 50% of the DNA helices are dissociated during PCR amplification.
Annotation	Comments on or explanation of a text or data.
Barcode	Short sequences of typically six or more nucleotides that are used to identify/label individual samples when they are pooled in one sample.
Binary Tree	Tree‐like data structures with two (binary) branches. The point from where each branch separates is called a node.
Binding site	A region of protein or DNA where the ligands bind.
Bioinformatics	A branch of science that interprets biological data with the help of statistics, computer science, mathematics and engineering.
Biostatistics	The application of statistics in biological science.
Bit score (S’)	The similarity between two sequences by alignments, expressed by bit scores (denoted by “S”). The higher the scores are, the better the alignment is. It is calculated from the formula that considers conserved sequence, identical sequence and gaps therein.
BLASTn	Standard Nucleotide BLAST. Here, two nucleotide sequences are compared. The word BLAST (Basic Local Alignment Search Tool) is online software to compare query sequences from an online database.
BLASTp	The term BLASTp stands for protein BLAST. Here, two amino acid sequences are aligned and compared.
BLASTx	BLASTx aligns six conceptually translated DNA sequences from both the stands with a database of protein sequences.
Bridge amplification	Amplification of fragments attached on a chip by the adapter at both of its ends.
Burrows–Wheeler transform	An aligner that helps in reading large volumes of short‐read data that have not been fully studied
Clustering	In gene expression analysis, a microarray cluster is the grouping together of genes of similar functions. In phylogenetic tree analysis, the data points having smaller or larger distances are connected and form different clusters. The distance matrix is calculated based on some algorithm, and there are more than 100 algorithms published. Hierarchical clustering is one of the common examples of connectivity‐based clustering methods.
CpG Islands	Word CpG stands for Cytosine‐phosphate diester‐Guanine. CpG is an area of increased density C and P in the DNA (100–1000 bp long) at various places. CpG areas are usually non‐methylated and present near 5’‐end of gene at transcription initiation sites. In humans, there are around 45 000 CpG islands in the DNA. CpG sites are important, as they are involved in regulation of gene transcription.
C _t value	The cycle number in real time PCR when the fluorescent signal is above the threshold limit and can be detected by a machine. It is also called the Cp value.
de novo Assembly	Sequencing of genetic materials if the reference sequence is not available.
Deep sequencing	Repeated time sequencing of genetic material, measured in terms of coverage.
Delta BLAST	Domain Enhanced Lookup Time Accelerated (Delta) BLAST is a new algorithm to yield better homology of remote protein sequences. It searches a database of the pre‐constructed position‐specific scoring matrix (PSSM) before searching a protein sequence database. The web link for the paper describing Delta‐BLAST for the first time is: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3438057/.
Delta C_t value	The difference between the two threshold cycles (C_t ) of two genes (say, target and control genes, or target and reference genes).
Docking	Docking refers to a method that predicts the orientation of one molecular by binding to another molecular while making a complex. In bioinformatics, docking is a computational simulation of a ligand binding to its receptor.
Domain	A conserved part of a protein whose tertiary structure changes independently from that of the rest of the protein.
Dynamic Programming	A method of solving a complex problem by breaking it down into many sub‐problems. DP in bioinformatics has been used in the sequence alignment, DNA‐protein binding prediction, and protein structure prediction.
Edge Length	In the phylogenetic tree, an edge length is a number associated with an edge which represents either time or expected genetic distance from the other branches.
Energy Functions
E‐Value (Expectation value)	A way of representing the significance of the alignment. It is a probability of this alignment occurring with a particular bit score (S) or better in the database search. The lower the value, the better is the chance of getting this alignment.
Expressed sequence tag (EST)	A short sequence of cloned cDNA that is used to identify gene transcripts in gene discovery.
FASTA	FASTA is the first algorithm for searching database similarity in sequences. It is a text‐based format for representing nucleotide or amino acid sequences. The sequence in FASTA format begins with “>“ (greater than sign), followed by sequence description and sequence.
Fastq file	Result of primary analysis representing individual reads with quality indicators for each base of corresponding sequences
Functional annotation	Use of the analyzed output data of genomic and transcriptomic projects to describe gene/protein functions and interactions.
Gap‐penalty	During sequence alignment, to compensate insertion or deletion of query sequences, gaps in the sequences are introduced. Introduction or extension of gap is penalized in the scoring of an alignment of nucleotide or protein sequences and is called the gap penalty.
GC‐clamp	The presence of Guanine (Gs) and Cytosine (Cs) nucleotides at the 3’‐end of the primer. More than three Cs should be avoided. GC‐clamps help in the specific binding of primer with the DNA template.
Gene Identity Number (gi)	Sometimes written as “gi”, this number is simply a series of digits assigned to each sequence of NCBI. It has been discontinued from September 2016.
Gene ontology	The bioinformatics process is to annotate, assimilate and disseminate information of gene and gene product across all species through a common platform.
Genetic Code	These are the triplets of three nucleotides that code for amino acids. Those triplets that do not code for any amino acid (UGA, UAG, and UAA) are called stop codons and, therefore, halt translation.
Genomic Survey Sequences (GSS)	Genome survey sequences are the short genomic DNA sequences from coding, non‐coding and repetitive portions of genomic DNA that aid in rapid characterization of the unknown genome.
Genomics	Study of the whole DNA of an organism, e.g., genes, their structure, and organization, location in the chromosome, etc.
Gibb’s free energy	The Gibbs free energy of a system at any time is defined as the enthalpy (H) of the system – the product of the entropy (S) of the system multiplied by the temperature (T), i.e., G = H – ST.
Global Alignment	The Needleman–Wunsch based algorithm dynamic programming methods of aligning two or more nucleotide sequences that are similar in nature.
Guide tree	This is constructed during multiple sequence alignment from the pair‐wise distance scores. It is different from the phylogenetic tree that is constructed at the end of the MSA.
Hairpin loop (turn)	A hairpin loop is formed by single‐stranded DNA or mRNA when a portion of strand folds up and pairs with another section of the same strand. In designing primers (short oligonucleotides) for PCR, the formation of the hairpin loop at the 3’ end is avoided because if affects PCR efficiency.
Heuristic program	A method of problem‐solving that often involves experimentation on the basis of trial and error. Likewise, a heuristic program is an algorithm that produces an acceptable solution without formal proof of its correctness.
Hidden Markov Model (HMM)	HMM is used to present the probability distributions over the sequences of observations. It is a Markov model with a hidden (unobserved) state, where the state is not directly visible but the output is visible.
High‐throughput genomic sequences (HTGS)	The division to accommodate rapidly growing unfinished genomic sequence databases of DDBJ, EMBL, and GenBank, where sequences are available for BLAST homology. When sequences are at the finished level (phase 3: finished with no gaps either with or without annotations), the data are moved from HTGS to the corresponding taxonomic division.
High‐scoring segment pair (HSP)	HSP is the basic unit of BLAST algorithm output. It consists of two sequence fragments whose alignment is locally maximal, and for which the alignment score meets or exceeds a threshold or cut‐off score.
Homology	Homology is the shared ancestry between a pair of the genes in different species.
InDel	An abbreviation of “insertion and deletion” of genes in mutation.
InDels	One or more Insertion or Deletion event detected in sequences of genetic materials.
Internal Node	The intermediate node between root node and leaf node in a phylogenetic tree.
International Nucleotide Sequence Database Collaboration (INSDC)	A long‐standing foundational collaboration between DDBJ, EMBL, and NCBI in data raw reads, their alignment, assemblies and functional annotations, with related information on samples and experiments associated with the data.
Iteration	The process of repeating a process many times unless the desired results are achieved.
Leaf	In a phylogenetic tree, a leaf usually represents a single present‐day taxon that is typically a DNA sequence whose genetic distance is measured with other taxa.
Library	This refers to a collection or pool of DNA or cDNA of an entire organism. A collection of the entire genome (exon and introns) is called a genomic DNA library, and a collection of all complementary DNA is called the cDNA library.
MegaBLAST	Alignment of larger DNA sequences that differ slightly as a result of sequencing. MegaBLAST is similar to BLASTn but able to efficiently handle longer DNA sequences.
Microarray	It is a set of DNA sequences representing the entire set of genes of an organism that are arranged (arrayed) in a grid pattern for use in gene expression analysis (cDNA microarray) or genetic testing (DNA microarray). A typical microarray experiment involves hybridization of mRNA molecules (called targets) to the DNA template (called probes) from which it is originated.
Mispriming	When primers of PCR anneal to non‐specific sites, leading to the background or non‐specific amplification, this is called mispriming.
Monte Carlo Simulation	A computerized mathematical technique to analyze risk assessment in quantitative analysis. It provides all possible outcomes of decisions and risk assessment, allowing scientists to make a better decision.
Motif	Motifs are the structural characteristics of a protein that are associated with a particular arrangement of amino acids. When such arrangements of amino acids are associated with a function like DNA binding or catalytic activity, then it is called a domain.
Multiple alignments	A computational method that lines up, as a set of three or more sequences in row, to identify overlapping positions with maximum accuracy and minimum mismatches and gaps.
Next‐generation sequencing	High‐throughput sequencing to sequence DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing, by producing thousands or millions of sequences at once
Node	A node in a phylogeny represents the common ancestor of a set of taxa, from which different taxa are descended.
omics	The word “omics” is informally related to the field of biology such as genomics, proteomics or metabolomics, where the suffix ‐omics refers to the field of study of the genome, protein or metabolites, respectively.
Paired‐end sequencing	The sequence of the DNA is obtained from the 5’ ends of both strands of the insert.
Palindrome	A sequence of the word (or nucleotide) that reads the same backwards or forwards. For example, in the word RACECAR, the arrangement of the word is the same forwards and backwards.
Phi angle	A torsion angle of right‐handed rotation around the N‐atom of the NH₂ group and the C‐alpha atom of the Carboxyl group (N‐Ca bond). The angle ranges from –180 to +180 degrees.
Phred scale	Measurement of base calling accuracy using the Phred quality score (Q score) for assessing the accuracy of a sequencing platform.
Position Hit Initiated BLAST (PHI‐BLAST)	A variant of PSI‐BLAST, based on the construction of Position‐Specific Scoring Matrix (PSSM) around a motif of protein.
Position‐Specific Iterative BLAST (PSI‐BLAST)	An iterative search of the protein BLAST algorithm.
Position‐Specific Scoring Matrix (PSSM)	A profile providing matching of an amino acid of a target sequence from a query sequence, estimated by log‐odd scores.
Primary structure	The primary structure of a protein or polypeptide is a linear sequence of amino acids from the N‐terminal to the C‐terminal end.
Primer	18–25 bp of nucleotides sequences (in pairs usually) used to amplify specific genes in PCR.
Probe (microarray)	In a spotted microarray, probes refer to synthesize short oligonucleotides or DNA that is complementary to mRNA.
Prosthetic Group	“Prosthetic” means an external part that supports the functions of an organ. Similarly, a prosthetic group is a non‐protein part, like vitamins or metal ions, that accelerates functions of an enzyme or protein.
Protein families	Like gene families, protein families are evolutionarily related proteins that share common features or functions.
Protein Isoelectric Point (pI)	The pH of a solution at which amino acid does not migrate in an electric field. For example, the pI of aspartic acid is 2.77, and of arginine is 10.76.
Proteomics	The entire set of proteins expressed by a genome of a cell/tissue/organism at a particular point in time.
Pseudo Count	In probability estimation of a model, an amount is added to the number of observed cases. Those priori counts, which might a subjective value, are called pseudo counts.
Psi angle	A torsion angle of right‐handed rotation around the C‐alpha atom of the carboxyl group and C‐atom bond (Ca‐C bond). The angle ranges from –180 to +180 degrees.
Query Coverage	The percentage of the query sequence that overlaps the subject sequence.
Ramachandran Plot	A diagrammatic visualization of protein structure by dihedral angles, psi (ψ) against phi (ϕ), against amino acid residues. It was originally developed by a team led by Ramachandran.
Raw alignment score (S)	A number used to assess the biological relevance of alignments of two sequences, where a higher score corresponds to a higher similarity of two sequences.
RCSB	Research Collaboratory for Structural Bioinformatics, founded in 1998 and responsible for maintaining protein data bank (PDB). PDB is the single worldwide repository maintaining the 3D structure of proteins and nucleic acids.
Real‐time PCR	The real‐time quantitative polymerase chain reaction (RT‐qPCR), where the formation on amplicons can be visualized in real time on a monitor or screen. It is an advanced form of conventional PCR and utilizes a double‐stranded DNA binding dye that combines with accumulated amplicon to be detected by the camera.
Reference genome	Reference assembly is a digital nucleic acid sequence database of set of genes assembled as a representative example of a species and can be retrieved using three different genome browsers.
RefSeq	“Reference Sequence” of either protein or nucleotide in a database of NCBI, derived from curation and computation of archived sequences.
Re‐sequencing	Sequencing of genetic material with reference sequence available.
Restriction Enzyme	Also called “molecular scissors”, used to chop DNA/plasmid sequences at specific sites in either a blunt or sticky end fashion to generate recombinant DNA.
Rn Value	An abbreviation of “normalized reporter value”. The Rn value is the fluorescent signal of SYBR Green dye (DNA intercalating dye) normalized to (divided by) the signal of the passive reference dye (e.g., Rox). The delta Rn value is the Rn value of the reaction minus the Rn value of the baseline signal of the instrument.
Root	The root of a tree is the node of the phylogenetic tree that represents a common ancestor.
RSCB	A protein databank, an informative tool of predict molecular structure of proteins, genomic position and sequence alignments. The web link to the RSCB portal is: www.rscb.org/
SCOP	Standing for Structural Classification of Proteins, this is a manual classification of protein structural domains based on their amino acid sequences and structures. The SCOP database was discontinued in the year 2009, and a newer and better prototype is available, called SCOP2.
Secondary Structure	The second level of protein structure. The most common type of secondary structure in proteins is the alpha‐helix. Beta‐sheets are another type of secondary structure of protein.
Sequence format	The method of writing the nucleotide bases of a sequence is called the sequence format. There are various ways to write sequences, including: plain sequence format; EMBL format; FASTA format; GCG format; GenBank format; and IG format.
Sequence Similarity	Comparing sequences of either DNA, RNA or protein with each other for a degree of similarity is one of the most frequent tasks of computational biology. Two sequences showing a high degree of similarity often implies similar functions.
Sequence Tagged Sites (STS)	A 200–500 bp long DNA sequence that occurs singly (one copy) in a genome whose location and sequence are known. STS may contain repetitive sequences, but usually flanked by unique flanking regions (not present elsewhere in the genome). The microsatellite is a type of STS.
Short read	Single‐End and Pair‐End methods of sequencing of fragments of genetic material as per the specified read length.
SNP mining	The extraction of valuable information from single nucleotide polymorphism (SNP) data. SNP is a fast and cost‐effective means of studying genetic variation.
Subtree	A part of the original tree, representing a fraction of the taxa being studied.
Taxa	The singular form of taxa is the taxon. This is a generic name for a taxonomic group, such as species. Taxon also represents genera, families, orders, phyla, and so on.
Taxonomy	Taxonomy is a branch of science that deals with the classification of new organisms and species systematically.
tBLASTn	Alignment of protein vs. translated nucleotide sequences for the identification of database sequences that encode proteins.
tBLASTx	Alignment of translated nucleotide vs. translated nucleotide sequences for identification of nucleotide sequences, based on their coding potential.
Tertiary Structure	The third level of protein structure, describing complex and irregular folding of peptide chains in three dimensions.
Third party annotation (TPA)	An annotated database derived from GenBank primary data or DDBJ/EMBL sequence databases. A TPA database could be experimental (if annotated from wet‐lab experiment) or inferential (annotated by inference only).
Threading (protein sequence)	Protein threading refers to a method of protein modeling, where proteins may not be homologous but may have the same fold as a protein of known structure.
Topology	The physical layout of a gene or protein network is referred to as its topology. The three main topologies of a network are ring, bus, and star, which more likely exist as hybrid networks (combinations of ring and bus, or ring and star, or bus and star).
Torsion Angle	The angle of the geometric relation of two parts of a molecule joined by a chemical bond.
Transcriptome Shotgun Assembly (TSA)	An archived data of computationally assembled sequences derived from ESTs and next‐generation sequencing.
Transcriptomics	Study of whole RNA profile (transcripts) of cells/tissue at a particular point in time (development stage, normal or diseased stage).
Tree	A phylogenetic tree, or simply “tree”, is an evolutionary relationship among a set of organisms called a taxon.
Ultrametric Tree	It is a rooted tree with equal edge lengths from the root and represents an equal rate of mutation in all the lineages. It is also called a “dendrogram.”
Whole Genome Shotgun Contigs	The sequence of the overlapping fragments of the whole genome
X‐Ray crystallography	A tool to identify the atomic and molecular structure of a crystal by using X‐rays.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Glossary

Create new playlist

Sign In

Sign Up

Table of Contents for
Glossary