CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
Phylogenetic trees are frequently encountered in research papers related to evolution, population diversity, microbial studies, and genetics. It is critical to infer the meaning of a given phylogenetic tree, as terms such as “cladogram”, “phylogram”, “phenogram”, etc. sound quite confusing to a novice. This chapter starts with such terminologies, and then the phylogenetic tree is explained to decipher the meaning depicted in general.
A rectangular, horizontal phylogenetic tree is shown in Figure 27.1, which has been constructed using 18s rRNA sequences from nine divergent taxa. In general, a phylogenetic tree is two‐dimensional, consisting of horizontal (analogous to the X‐axis of a graph) and vertical (as Y‐axis) axes.
The leaves or the terminal taxa are connected by internal nodes (solid circles) (Figure 27.1). The tree gradually shrinks towards the left and ends at the hypothetical common ancestor (solid square).
This is the scale that signifies evolutionary distance (of a dendrogram) or time‐scale (of a chronogram). The branch length of the current dendrogram (Figure 27.1) denotes the evolutionary distance between two taxa – the longer the branch, the more genetic change that taxon (or cluster of taxa) has experienced over the time of evolution.
A scale at the bottom of the tree acts as the unit of substitution of residue per site (base or amino acid, depending on the type of tree) and, thus, measures the substitution of residues. The following formula determines it:
A scale of 0.1 means the amount of genetic change is 0.1 per unit length of the branch (indicated by the scale length). Thus, the total amount of genetic change will be
When the scale is represented as a percentage (here, it is 10%), this means that ten nucleotides have been substituted out of 100 residues. Please note that this does not necessarily mean that ten different nucleotides have been substituted, but that a single residue could have experienced substitution for multiple times. That is why a given value of 1.0 (or 100% in percent scale) does not mean all bases have been substituted but, rather, that 100 substitutions have taken place, some of which have occurred at the same residue position. Sometimes, the evolutionary scale is also represented as integer values, indicating the net number of base substitutions.
This direction has no meaning so far as evolutionary distance (or genetic changes) or the time‐scale is concerned. This dimension is used only to place the taxa while building the phylogenetic tree. The branches of a sub‐tree, or sub‐sub‐tree, or the whole tree, can be swapped without altering the meaning of the tree (depicted in terms of evolutionary relationship) (Figure 27.2). One can also increase the distance (width along the vertical axis) among the taxa, although it will have no impact on the meaning depicted by the tree.
A dendrogram can be drawn in several ways without distorting its meaning. The depictions are useful under different circumstances.
This representation is well suited for both rooted and unrooted trees, and such trees are most easily understood. The branches connecting the taxa are separated by a vertical line (of an arbitrary length). The midpoint of the vertical line indicates the internal node (representing the hypothetical common ancestor of these taxa, which are not available at the present time) between two taxa being connected.
The rectangular trees are modified to straight tree by joining the taxa to the respective internal nodes directly (no vertical line is used), which makes the appearance of the tree more convergent towards the common ancestor. A straight tree depicts the same information as a rectangular tree.
The typical tree‐like appearance is substituted with a comparatively simple depiction. The divergence of the component taxa is not shown from a hypothetical ancestor (i.e., internal node). Figure 27.3 depicts how a straight tree can be converted to a radiation tree. The evolutionary scale may not be shown in this type of tree, though the node statistics (bootstrap values) and scale are present.
Both rooted and unrooted trees can be depicted by a circular tree. The distance from the center denotes the branch length. The distance at the periphery counts as nothing (like the vertical axis of rectangular or straight trees).
There are two broad methods of phylogenetic tree construction: distance‐based and character‐based methods.
A distance matrix containing the pairwise distances between the input sequences is first generated through multiple sequence alignment (MSA). The number of substitutions of residues (spanned throughout the length) between each pair of multiple molecular sequences is calculated and is then converted into a single value (for each pair), using a suitable model. Examples of distance‐based phylogenetic algorithms are UPGMA, Neighbor‐joining (NJ), and Fitch–Margoliash. An appropriate evolutionary model is selected, based on the underlying evolutionary process in distance‐based methods. Examples of such evolutionary models are: JC69 (Jukes and Cantor, 1969), K80 (Kimura, 1980), F81 (Felsenstein, 1981), HKY85 (Hasegawa et al., 1985), T92 (Tamura, 1992), TN93 (Tamura and Nei, 1993), and GTR (generalized time‐reversible; Tavaré, 1986).
The evolutionary model is required to calculate the number of substitution, based on certain assumptions. Thus, selection of the evolutionary model is as critical as the selection of the appropriate phylogenetic algorithm. The later depends on the sequence type (amino acid or RNA or Coding DNA or non‐coding DNA or intergenic DNA), sequence divergence, sequence length, and so on.
Individual residues of the sequences are taken into account to construct the tree. Here, instead of calculating the distances among the taxa, the sequences are aligned, to find out the similarity and dissimilarity among characters in each of the columns of aligned sequences. The total number of different residues (over the length) is not calculated but, rather, some particular state (or location) of the aligned residues is identified to define the evolution of the sequences. Examples of a character‐based method are maximum parsimony, maximum likelihood, and Bayesian inference. Maximum likelihood utilizes both approaches (distance‐ and character‐based).
Again, phylogenetic trees can be constructed by any one of the following two methods:
This is a method that discovers the evolutionary relationship among taxa through intermediate, as well as common, ancestry. This approach yields a cladogram; for example, maximum parsimony.
This studies the degree of similarity among a group of organisms to unveil the relationship through a tree‐like network (called a phenogram), e.g. UPGMA, maximum likelihood method. The rate of divergence is assumed to be uniform among the taxa.
Now we will compare the outputs of different molecular phylogeny methods, using a set of nucleotide sequences (18s rRNA) belonging to nine organisms representing distant taxa.
At the outset, the best model (i.e., TN93 + G) was selected, based on the least Bayesian information criterion (BIC) score (which was 10145.019). Parameters selected for each of the methods have been specified along with the tree in Figure 27.5.
Parametric details for each of the algorithms used in constructing the phylogenetic trees are as follows:
TABLE 27.2 Comparison between the features of the trees generated from the following important phylogenetic algorithms (Desper and Gascuel, 2005).
SN | Tree | Characteristic features |
1 | Maximum parsimony |
|
2 | UPGMA |
|
3 | NJ |
|
4 | ME |
|
5 | ML |
|
TABLE 27.3 Pairwise distances (calculated by maximum composite likelihood model, using MEGA7) between the input sequences are shown in the lower triangular matrix.
M11188 | NR_046271 | M59392 | AF173629 | M10098 | NR_074540 | X61688 | EF645689 | AY036903 | |
M11188 | 0.002 | 0.009 | 0.007 | 0.002 | 1.721 | 1.463 | 1.472 | 1.706 | |
NR_046271 | 0.004 | 0.009 | 0.007 | 0.002 | 1.723 | 1.464 | 1.473 | 1.708 | |
M59392 | 0.035 | 0.034 | 0.010 | 0.009 | 1.699 | 1.461 | 1.470 | 1.717 | |
AF173629 | 0.026 | 0.025 | 0.039 | 0.007 | 1.690 | 1.487 | 1.496 | 1.689 | |
M10098 | 0.003 | 0.003 | 0.034 | 0.026 | 1.721 | 1.462 | 1.472 | 1.706 | |
NR_074540 | 2.582 | 2.593 | 2.551 | 2.571 | 2.584 | 1.201 | 1.196 | 0.912 | |
X61688 | 2.250 | 2.261 | 2.239 | 2.285 | 2.253 | 1.868 | 0.003 | 1.111 | |
EF645689 | 2.266 | 2.276 | 2.253 | 2.299 | 2.268 | 1.859 | 0.006 | 1.112 | |
AY036903 | 2.546 | 2.557 | 2.556 | 2.553 | 2.548 | 0.249 | 1.716 | 1.723 |
18.226.214.128