CHAPTER 24
Construction of Phylogenetic Tree: Maximum Parsimony Method

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

24.1 INTRODUCTION

“Parsimony” means penny‐pinching or thriftiness. Here, the term “maximum parsimony” (MP) indicates extreme caution to opt for simpler hypotheses to select a tree (out of many) with minimum length (sounds like the minimum evolution (ME) method). This approach is applied when the ancestral relationship is required to be reconstructed. The MP method is not the same as the ME method. MP identifies position‐specific differences between all pairs of sequences, while ME indicates the distance (in terms of the number of differences among any two sequences).

24.1.1 Principle

MP is a character‐based method, not distance‐based like ME. The MP method identifies the “most parsimonious tree” through optimality criteria (not by clustering methods like UPGMA or NJ). Thus, the tree that needs a minimum number of evolutionary events (characterized by residue substitutions) to explain the sequence variation is selected.

24.1.2 Assumption

A tree that needs fewer substitutions is better than a tree that requires more (Li, 1997).

24.2 OBJECTIVE

To construct a phylogenetic tree using the MP method from a set of molecular sequences.

24.3 PROCEDURE

  1. The tree is constructed following multiple sequence alignment (MSA). Each column of MSA is considered to determine the “most informative columns” to initiate the tree building (elaborated below). The most informative site is the column which has at least two different residues, and each residue is represented at least twice (thus, at least three input sequences). It is based on the smallest number of evolutionary changes. Analysis of evolutionary changes for each residue is scored vertically in MSA.
  2. The score is assigned to each tree. Finally, the best one (with the least score) is selected.
  3. “Most parsimonious tree” means the tree with a minimum number of evolutionary changes.

24.3.1 Multiple Sequence Alignment (MSA)

Align the input nucleotide sequences (given below) to determine the most informative column(s) to determine the weights of each of the type of substitution.

TABLE 24.1

Sequences Residues
Seq 1 A G G A C A
Seq 2 C G A A A T
Seq 3 G G C A G C
Seq 4 T C A A T G

The third column of the aligned sequences is the most informative site (shown in bold), as it contains at least two base variants present at least twice in the particular column.

24.3.2 Selection of algorithm

Two types of algorithms are broadly used to calculate the distance for maximum parsimony: Fitch (abbreviated as “F” in the following demonstration) and Transversion Parsimony (abbreviated as “T‐P”). These two scoring schemes for base substitution differ in the weights given to the type of substitution.

24.3.3 Fitch (F)

The weight assigned to both transversion, as well as transition to different bases, is one (1), while transition to the same state (i.e., “G” to “G”, or “A” to “A”, etc.) is assigned no weight (Table 24.2).

TABLE 24.2

Scoring scheme of nucleotide substitution for Fitch method
G A C A
G 0 1 1 1
A 1 0 1 0
C 1 1 0 1
A 1 0 1 0

24.3.4 Transversion Parsimony (T‐P)

The weights assigned are 4 for transversion, 1 for transition to other bases (other than self) and 0 for transition to the same base (Table 24.3).

TABLE 24.3

Scoring scheme of nucleotide substitution for T‐P method
G A C A
G 0 1 4 1
A 1 0 4 0
C 4 4 0 4
A 1 0 4 0

24.3.5 Construct a topology for the sequences

Now construct a topology of the unrooted tree to initiate the analysis. In our current example, four sequences (Seq 1, Seq 2, Seq 3 and Seq 4) are considered. One of the possible topologies is ((Seq 1, Seq 3), (Seq2, Seq 4)). The alternative topologies are ((Seq1, Seq2), (Seq3, Seq4)) and (Seq1, Seq4), (Seq2, Seq3)), which are also to be tested to find the trees yielding the minimum score(s).

2 Phylogenetic trees consisting two connecting circles, each with two branches. Top figure: Circles with branches labeled Seq1, Seq2, Seq3, and Seq4. Bottom figure: Circles with branches labeled G, C, A, and A.

FIGURE 24.1

The bases in the third column of the aligned sequences (i.e., the informative site) are “G”, “A”, “T” and “A” for Seq 1, Seq2, Seq3 and Seq 4, respectively.

24.3.6 Scoring for substitution of the bases

Now, these blank spaces will be filled with all possible combinations of the nucleotide bases.

4 Phylogenetic trees consisting two connecting circles, each with two branches. Clockwise: Circles labeled A and A, A and T, A and G, and A and C. At bottom of each tree is a table containing data.

FIGURE 24.2

The scores of each of the combinations are calculated using both of the weighing schemes (F and T‐P).

MP searches the optimal tree out of all the possible combinations. The paths with the least scores are to be determined individually for each of the weighing schemes.

The Fitch scheme shows that the minimum score is 2 for the four paths given below:

The T‐P scheme shows a least score of 5 for two of these paths. Hence, these two paths are the best ones for the current topology.

It is necessary to iterate the same procedure for assessing the other topologies:

  1. ((Seq1, Seq2), (Seq3, Seq4))
  2. ((Seq1, Seq4), (Seq2, Seq3)).

The final optimal tree with the minimum score is selected after obtaining all the topologies, based on the least score in the Fitch or Transition Parsimony methods of scoring.

The number of paths exponentially increases with the number of sequences (OTUs). The branch and bound method is used to find the optimal tree without verifying all possible trees, in order to save computational resource requirements and time.

The number of rooted [obtained as (2n‐3)*(2n‐5)*(2n‐7)*…*3*1=(2n‐3)!/((2^(n‐2))*(n‐2)!)] and unrooted trees [obtained as (2n‐5)*(2n‐7)*…*3*1 = (2n‐5)!/(2^(n‐3)*(n‐3)!)] required to be analyzed in this method is given in Table 24.4:

4 Phylogenetic trees consisting two connecting circles, each with two branches. Clockwise: Circles labeled T and T, T and A, T and G, and T and C. At bottom of each tree is a table containing data.

FIGURE 24.3

4 Phylogenetic trees consisting two connecting circles, each with two branches. Clockwise: Circles labeled G and G, G and A, G and T, and G and C. At bottom of each tree is a table containing data.

FIGURE 24.4

4 Phylogenetic trees consisting two connecting circles, each with two branches. Clockwise: Circles labeled C and C, C and A, C and T, and C and G. At bottom of each tree is a table containing data.

FIGURE 24.5

4 Phylogenetic trees consisting two connecting circles, each with two branches. Top–bottom: Circles labeled A and A, T and A, G and A, and C and A.

FIGURE 24.6

TABLE 24.4

OTU Count Number of unrooted tree(s) Number of rooted tree(s)
2 1 1
3 1 3
4 3 15
5 15 105
10 34 459 425 2.13E15
15 2.13E + 15 8.E + 21
50 2.84E + 74 2.75E + 76
n images images

24.4 INTERPRETATION OF MP TREE

The tree with a minimum number of steps has been selected as the optimal tree out of the finally obtained set of valid trees. Parsimony always chooses the tree that sought a minimum number of changes to explain the data. MP trees do not show branch length as an indicator of distance, but the number of substitutions is counted. The total number of steps required to reach the final step is known. The validity of branching tested through bootstrapping is denoted at the nodes of the consensus tree, which is defined as the tree obtained through an agreement for each node among several bootstrap trees.

A method such as “weighted parsimony” is used to identify the sites with the highest or the lowest rate of evolution.

24.5 QUESTIONS

  1. 1. Construct the most parsimonious tree using the following sequences:

    TABLE 24.5

    Seq.Bases
    Seq 1TAGATCGCA
    Seq 2TCCCTCGCG
    Seq 3TCTGACCCA
    Seq 4TCAGACCCG
  2. 2. How does one determine the most informative sites in a given set of sequences for maximum parsimony analysis?
  3. 3. Differentiate between the maximum parsimony and minimum evolution methods of phylogenetic tree construction.
  4. 4. Why is there no scale given in the phylogenetic tree obtained from the MP method?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.162.59