CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
“Parsimony” means penny‐pinching or thriftiness. Here, the term “maximum parsimony” (MP) indicates extreme caution to opt for simpler hypotheses to select a tree (out of many) with minimum length (sounds like the minimum evolution (ME) method). This approach is applied when the ancestral relationship is required to be reconstructed. The MP method is not the same as the ME method. MP identifies position‐specific differences between all pairs of sequences, while ME indicates the distance (in terms of the number of differences among any two sequences).
MP is a character‐based method, not distance‐based like ME. The MP method identifies the “most parsimonious tree” through optimality criteria (not by clustering methods like UPGMA or NJ). Thus, the tree that needs a minimum number of evolutionary events (characterized by residue substitutions) to explain the sequence variation is selected.
A tree that needs fewer substitutions is better than a tree that requires more (Li, 1997).
To construct a phylogenetic tree using the MP method from a set of molecular sequences.
Align the input nucleotide sequences (given below) to determine the most informative column(s) to determine the weights of each of the type of substitution.
Sequences | Residues | |||||
Seq 1 | A | G | G | A | C | A |
Seq 2 | C | G | A | A | A | T |
Seq 3 | G | G | C | A | G | C |
Seq 4 | T | C | A | A | T | G |
The third column of the aligned sequences is the most informative site (shown in bold), as it contains at least two base variants present at least twice in the particular column.
Two types of algorithms are broadly used to calculate the distance for maximum parsimony: Fitch (abbreviated as “F” in the following demonstration) and Transversion Parsimony (abbreviated as “T‐P”). These two scoring schemes for base substitution differ in the weights given to the type of substitution.
The weight assigned to both transversion, as well as transition to different bases, is one (1), while transition to the same state (i.e., “G” to “G”, or “A” to “A”, etc.) is assigned no weight (Table 24.2).
Scoring scheme of nucleotide substitution for Fitch method | ||||
G | A | C | A | |
G | 0 | 1 | 1 | 1 |
A | 1 | 0 | 1 | 0 |
C | 1 | 1 | 0 | 1 |
A | 1 | 0 | 1 | 0 |
The weights assigned are 4 for transversion, 1 for transition to other bases (other than self) and 0 for transition to the same base (Table 24.3).
Scoring scheme of nucleotide substitution for T‐P method | ||||
G | A | C | A | |
G | 0 | 1 | 4 | 1 |
A | 1 | 0 | 4 | 0 |
C | 4 | 4 | 0 | 4 |
A | 1 | 0 | 4 | 0 |
Now construct a topology of the unrooted tree to initiate the analysis. In our current example, four sequences (Seq 1, Seq 2, Seq 3 and Seq 4) are considered. One of the possible topologies is ((Seq 1, Seq 3), (Seq2, Seq 4)). The alternative topologies are ((Seq1, Seq2), (Seq3, Seq4)) and (Seq1, Seq4), (Seq2, Seq3)), which are also to be tested to find the trees yielding the minimum score(s).
The bases in the third column of the aligned sequences (i.e., the informative site) are “G”, “A”, “T” and “A” for Seq 1, Seq2, Seq3 and Seq 4, respectively.
Now, these blank spaces will be filled with all possible combinations of the nucleotide bases.
The scores of each of the combinations are calculated using both of the weighing schemes (F and T‐P).
MP searches the optimal tree out of all the possible combinations. The paths with the least scores are to be determined individually for each of the weighing schemes.
The Fitch scheme shows that the minimum score is 2 for the four paths given below:
The T‐P scheme shows a least score of 5 for two of these paths. Hence, these two paths are the best ones for the current topology.
It is necessary to iterate the same procedure for assessing the other topologies:
The final optimal tree with the minimum score is selected after obtaining all the topologies, based on the least score in the Fitch or Transition Parsimony methods of scoring.
The number of paths exponentially increases with the number of sequences (OTUs). The branch and bound method is used to find the optimal tree without verifying all possible trees, in order to save computational resource requirements and time.
The number of rooted [obtained as (2n‐3)*(2n‐5)*(2n‐7)*…*3*1=(2n‐3)!/((2^(n‐2))*(n‐2)!)] and unrooted trees [obtained as (2n‐5)*(2n‐7)*…*3*1 = (2n‐5)!/(2^(n‐3)*(n‐3)!)] required to be analyzed in this method is given in Table 24.4:
OTU Count | Number of unrooted tree(s) | Number of rooted tree(s) |
2 | 1 | 1 |
3 | 1 | 3 |
4 | 3 | 15 |
5 | 15 | 105 |
10 | 34 459 425 | 2.13E15 |
15 | 2.13E + 15 | 8.E + 21 |
50 | 2.84E + 74 | 2.75E + 76 |
n |
The tree with a minimum number of steps has been selected as the optimal tree out of the finally obtained set of valid trees. Parsimony always chooses the tree that sought a minimum number of changes to explain the data. MP trees do not show branch length as an indicator of distance, but the number of substitutions is counted. The total number of steps required to reach the final step is known. The validity of branching tested through bootstrapping is denoted at the nodes of the consensus tree, which is defined as the tree obtained through an agreement for each node among several bootstrap trees.
A method such as “weighted parsimony” is used to identify the sites with the highest or the lowest rate of evolution.
52.15.173.64