Appendix E: Basics of Molecular Phylogeny

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

Phylogenetic analysis indicates the splits and diversions of species within ancestral lines, leading to a procreation of a clade. The term “clade” means a cluster of two or more species related by a common ancestor. The principle of phylogeny is relatedness among various organisms, due to descending from a nearer or remote common ancestor (CA). Thus, phylogeny is the relationship among different organisms due to sharing of a recent common ancestor (Zimmermann, 1931). It is a method by which to obtain an idea of the evolution and origin of an organism. The term “phylogeny” originates from two Greek words: Phylon (Stem) and Genesis (Origin).

GEOLOGICAL CLOCK

This is based on the regularity of the decay process of radioactive elements. Suppose an ancient rock which has been lying undisturbed is tested, using a mass spectrophotometer, for the amount of radioactive uranium (235U) and normal lead (207Pb). The former is decayed into the latter, with a half‐life of 710 million years (MY) (Guttman, 2007). The wider the ratio of uranium to lead, the older the rock is. Thus, the approximate time of fossilization of an individual can be estimated by geological study, and this forms the geological clock. It is thought‐provoking to note that the first fossil evidence for many of the animal phyla is available from the rocks preserved since the Cambrian Period of the Paleozoic era (510–540 MY) (Benton 1993; Graham 1993).

Geological studies have revealed some geological events that are closely related to the evolution of plants and animals. The birds and mammals first appeared during the Jurassic period of the Mesozoic era (208 million years ago (Mya)), which was the time of the dinosaurs. The supercontinent Pangea (whole land areas of the earth lying together) first disintegrated into Gondwanaland (which included India, Australia, Africa, etc.) and Laurasia (North America and Greenland) during the Mesozoic era (i.e., 160–170 Mya). The first primates had appeared on the earth by the Paleocene epoch of the Tertiary period of the Cenozoic era (≈66.4 Mya). The earliest hominids date back to the Pliocene Epoch (5.3 Mya) (Guttman, 2007). Thus, the genealogical clock reflects on the evolutionary perspective of the earth and the origins of different species on it.

MORPHOLOGICAL PHYLOGENY TO MOLECULAR PHYLOGENY

Early phylogenetic studies (prior to the 1960s) were based on morphological (morphos (Gr.): form, logos (Gr.): study) similarity and dissimilarities only. Fossil records and anatomical measurements are the prime sources of data for determining ancestral lineages. However, the morphology‐based approach has some inherent limitations, such as the fact that several morphological traits seems to be convergent and seem to overlap with each other. For example, different species of chickadee (Poecile atricapillus), a small North American songbird, have several apparently indistinguishable characters that can bewilder a skilled birder.

Morphological features are more qualitative than quantitative where the underlying inheritance pattern is not well established (http://www.life.umd.edu/classroom/bsci338m/Lectures/Systematics.html), and the limited availability of morphological data and fossil record makes it further challenging. No consistent results with genealogy or family pedigree can be obtained using morphological data but, nevertheless, the phenotypes of microbes hold little promise in depicting the evolutionary relationship among microbes, using morphology as a means.

Now the other side of the morphological systematic is the confounding resemblance between unrelated species, which could be due to convergent evolution (i.e., independent evolution in a similar environment, such as sharks and dolphins, or African euphorbias (Euphorbia spp.) and American cactus) (Ghosh and Mallick, 2008).

Adaptation to different ecological niche could also bring about strikingly different morphology among closely related species. The Hawaiian islands, an archipelago of eight major islands in the North Pacific Ocean, were formed about 0.5 to 0.8 Mya and became detached from the mainland. Hawaiian honeycreepers, which have descended from a common ancestor, exhibit different beak shapes due to their adaptation to varying ecological niches.

The limitations of morphology‐based phylogeny have now been replaced by molecular phylogeny, which uses molecular data (DNA/RNA/amino acid sequences, enzymatic data, etc.) for constructing the phylogenetic tree. Frederick Sanger first did the sequencing of bovine insulin in 1953. Later, the RNA sequencing technique (Min‐Jou et al., 1972) and then DNA sequencing, using mainly Sanger’s method (Sanger et al., 1977), became available, enabling scientists to make use of these sequences in reconstructing molecular phylogeny. FHC Crick suggested (in 1958) using the molecular sequences for phylogenetic tree reconstruction. However, Emile Zuckerkandl and Linus Pauling used aligned amino acid sequence data to build the first ever phylogenetic tree in 1962 (Morgan et al., 1998) and proposed the theory of the molecular clock (Morgan, 1998). The theory of molecular evolution then gained momentum. In 1967, Walter Fitch and Emanuel Margoliash designed the first algorithm (applying least squares) for phylogenetic tree reconstruction using protein sequences (Fitch and Margoliash, 1967; Fitch, 1970, 1971).

BASIS OF MOLECULAR PHYLOGENY

The phenotype (expression of a trait) of an individual is the result of its genotype (allelic combination(s) of a locus or multiple loci), modification of the genotypic effect by the environment in which it is raised, and the interaction between genotype and environment. The DNA sequence of the coding region of a gene is, to a great extent, the determinant of its phenotypic uniqueness. Genetic relationship among the close relatives confers similarity among them and discriminating uniqueness from unrelated individuals.

Traits can show homology as synapomorphies or as symplesiomorphies. Synapomorphies are the homologies that are derived from a common ancestor – in other words, ancestral homologies which are first observed in the ancestor of the clade. Thus, synapomorphies define a clade. On the other hand, symplesiomorphies are shared ancestral characters which have already arisen before the common ancestor of the clade. They are also passed on to the downstream taxa through the common ancestor. The phylogenetic tree is constructed on the basis of the evidence obtained from the synapomorphies only (http://biology.unm.edu/ccouncil/Biology_203/Summaries/Phylogeny.htm).

Shared characteristics among related individuals (having a common ancestor(s)) are the cornerstone of the theory of evolutionary phylogeny. The evolutionary process is depicted by the tree of life (TOL), where each species occupies a distinct position on the branch. The phylogeny represents the evolutionary process through the paths descending from the common ancestor(s) (CA), through the intermediate nodes to the ultimate terminal node or leaf, where the species/gene occupies its position. This path is known as the lineage. Molecular phylogeny, thus, uses the sequence data of DNA, RNA or protein. The accuracy of results depends on the types of input sequences (DNA, RNA, amino acid) and the divergence among the taxa incorporated in the study (Ghosh and Mallick, 2008):

  • Amino acid sequences are applied efficiently for most remote homologies.
  • DNA sequences are very sensitive, non‐uniform rates of mutation.
  • Coding DNA sequence (cds) are used to determine purifying selection in coding region.
  • RNA sequences are useful for remote homologies.
  • 16 s rRNA: considered as the most suitable phylogenetic marker.

The primary mechanism of molecular evolution is nucleotide substitution during the process of DNA replication. Different types of mutations (gross or point mutation) contribute to different types of germ‐line mutations that alter the phenotype. Among the point mutations, InDels (Insertions, Deletions) are frequently encountered. The types of point mutation vis‐à‐vis corresponding changes in the translated amino acid are shown in Figure E1. Besides, transposition, i.e. movement of the entire gene or non‐coding regions, exon shuffling, i.e. duplication of exons, exchange of structural or functional domains between protein‐coding genes (in multiple exons), transitions and transversions are also the underlying mechanisms.

Flow diagram of different types of point mutation leading to codon change, from point mutations to synonymous mutation and non-synonymous mutation.

FIGURE E1 Different types of point mutations leading to codon change.

Apart from mutations, natural selection of individual, genetic drift in a small population, bottleneck effects and so on play a significant role in the process of speciation.

MUTATION RATE

This measures the tempo or pace of mutations occurring during one unit of time. The mutation rate varies with the type of gene, or the type of organism whose genome is being studied. It can be measured in terms of mutations per base pair per cell division, or per gene (or per genome) generation. The molecular clock studies a region with predictable mutation rate, to calculate the time of divergence of two species, in geologic history. The estimated mutation rates of different types of organisms are as follows:

  • Unicellular eukaryotes and bacteria: ~0.003 mutations per genome per generation.
  • DNA viruses: 10−6 to 10−8 mutations per base per generation.
  • RNA viruses: 10−3 to 10−5 per base per generation.
  • Human mitochondrial DNA: ~3 × 10−5 to ~2.7 × 10−5 per base per 20‐year generation.
  • Human genomic mutation: ~2.5 × 10−8 per base per generation.
  • Human genome (WGS data): ~1.1 × 10−8 per site per generation.

COMPONENTS OF A PHYLOGENETIC TREE

A phylogenetic tree is a tree‐like structure. However, this can be rooted or unrooted. Various terms used to specify the components of a tree are given below:

  • Terminals/leaves: the species or the genes that have been sampled.
  • Internal nodes: ancestral state reconstruction for the characters being studied.
  • Branches: the relationships between the nodes. These can also represent the relative divergence among the terminal and nodes.
  • Horizontal branch length determines the time between speciation events according to the mutation rate or the mutation among the lineages, depending on the tree topology (Figure E2). Branch length is proportional to the evolutionary distance between the nodes (internal as well as external nodes), expressed as substitution or residue per site.
  • Distance scale: A scale that assesses the distances between different nodes, expressed in terms of a number of differences. It is expressed in a range between 0 and 1, which can be inferred as differences for 0 to 100% of the residues.
  • Operational Taxonomic Unit (OTU): refers to the hierarchical groups comprising external/terminal and internal nodes. The element of OTUs is a group of either genes or species that are sufficiently distinguishable from others (Figure E2).
  • Analogs refer to the traits which look similar as a result of convergent evolution, not due to inheritance from a common ancestor.
  • Taxa is a general term applied to a taxonomic group (i.e., families, genera or species, etc.). The most closely related taxa are called “sister taxa” in a phylogenetic tree.
Schematic illustrating components of a rooted phylogenetic tree, displaying shaded circles connected by lines with parts labels polytomy, sister taxa, terminal nodes, root, branches, internal nodes, etc.

FIGURE E2 The components of a rooted phylogenetic tree.

There are some terminologies which are frequently used in phylogeny:

  • A clade starts with a node (ancestor), and includes all taxa descending from the ancestor.
  • A monophyletic group is a good example of a clade. It is a group that includes a common ancestor along with all the descendants of that ancestor, but excluding all non‐descendants.
  • A paraphyletic group is a group of taxa which contains a common ancestor and some (but not all) of the descendants of that ancestor.
  • The polyphyletic group includes multiple taxa, but not the common ancestors (Figure E3).
Image described by caption and surrounding text.

FIGURE E3 Diagrammatic representation of monophyletic, paraphyletic and polyphyletic groups of taxa.

TYPES OF PHYLOGENETIC TREES

Unrooted tree

This illustrates the relatedness of the OTUs without making any assumptions about ancestry. No ancestor is determined in this type of tree. Unrooted trees show the differences between the taxa (regarding distance or proportion of residue change). However, no time frame can be deduced from the orientation of taxa in an unrooted tree.

Rooted tree

This is a directed tree, characterized by the finally converged node signifying the most recent common ancestor of all the entities. In other words, the tree topology shows a common ancestor to all the involved taxa. The lineages/branches sprouting from the common ancestor determine the evolutionary path (and its direction). A rooted tree can be generated by introducing an outgroup as the root. The outgroup comprises one or more distantly related taxa, known to share a distant common ancestor. A rooted tree is also generated using a molecular clock where the evolutionary process is assumed to happen at a constant rate along the branches of a tree. The topology is rooted at a point where it splits the amount of character evolution in half.

Converting an unrooted tree into a rooted tree

  • The inclusion of an outgroup: “Outgroup” refers to the lineage (or taxon) in a phylogenetic analysis that is the least related to the rest of the taxa in the analysis. Thus, it branches off at the base of that phylogeny. An outgroup is remote with respect to the clade being studied, since the members of the clade exhibit closer relatedness to each other than to the outgroup (http://evolution.berkeley.edu/evolibrary/glossary/glossary_popup.php?word=outgroup). The outgroup taxon (or taxa) is known to be external to the group being analyzed. Thus, the root lies at the branch joining the outgroup to the original clade (i.e., the ingroup).
  • Choosing an outgroup: the underlying assumption is that the inclusion of an outgroup does not alter (or influence) the relationship of the taxa of the original clade.
    • The inclusion of an incorrect outgroup may result in long branch attraction (LBA), a phenomenon which occurs if the distantly related clades cluster together, due to erroneous inferences drawn from shared homoplasies. This has been discussed in http://self.gutenberg.org/articles/Long_branch_attraction as “It is a result of the way clustering algorithms work: terminals or taxa with many autapomorphies (character states unique to a single branch) may by chance (convergence) exhibit the same states as those on another branch”. As a result, rapidly evolving taxa may be interpreted as closely related. The alternative approach is to include a group of taxa as an outgroup, instead of a single OTU.
    • The chance of LBA can be minimized either by changing the phylogeny model or by splitting the long branches with more taxa. Fast‐evolving taxa can also be removed from the set of the ingroup taxa.
    • It is better not to root an unrooted tree, if an erroneous inference is drawn due to the inclusion of an outgroup.
  • Using a molecular clock: The assumption behind using a molecular clock is a similar rate of evolution for all the lineages since splitting from the common ancestor. The most distant taxa are selected based on the branch lengths. The tree root is selected at the mid‐point between the two farthest taxa, so that the source is equidistant from all the external nodes.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.187.55