CHAPTER 22 Construction of Phylogenetic Tree: Fitch Margoliash (FM) Algorithm
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
22.1 INTRODUCTION
This is the first algorithm based on least squares principle for phylogenetic tree reconstruction. It was developed by Walter Fitch and Emanuel Margoliash in 1967 (Fitch and Margoliash, 1967; Fitch, 1970, 1971). The evolutionary distances between the taxa are determined by the Jukes–Cantor model when DNA sequences (instead of distances) of the same length are entered.
22.1.1 Principle
The algorithm is based on optimality criteria that select the tree with a minimum amount of residual (difference between actual and expected summed evolutionary distance). The algorithm estimates the total branch length (distance) and clusters in accordance to taxa pair in order to determine the unrooted tree with minimum distance.
22.1.2 Assumption
The algorithm does not assume a constant mutation rate.
It assumes additivity of distances – that is, additivity of the branch length of the trees to yield the total branch length or distance.
22.2 OBJECTIVE
To construct a phylogenetic tree using the Fitch Margoliash (FM) method, given the distances among a set of molecular sequences.
22.3 PROCEDURE
Let us start with four sequences: “A”, “B”, “C” and “D”, and consider that the given distances (d) between the four sequences are as follows:
TABLE 22.1
A
B
C
D
A
0
B
20
0
C
26
22
0
D
34
30
16
0
The iterative steps in this algorithm are as follows:
Consider two of the taxa (say, “A” and “B”) for determining the distance from the third composite taxa (denoted by “X”). The composite taxa (“X”) are a combination of the rest of the taxa (here, “D” and “C”).
The distance between “A” and “X” (dAX) is calculated by averaging both the distances from “A” to “C” and “D” taxa (all the component OTUs of the composite taxa “X”).
Similarly, the distance between “B” and “X” (dBX) is also calculated:
TABLE 22.2
A
B
C
D
A
0
dAB = 20
B
20
0
dAX = (dAC + dAD)/2 = (26 + 34)/2 = 30
C
26
22
0
dBX = (dBC + dBD)/2 = (22 + 30)/2 = 26
D
34
30
16
0
In the next step, the distance (P1) between the terminal node (taxa “A”) and its intermediate ancestor (“P”) is calculated using the formula: P1 = (dAB + dAX – dBX)/2.
Similarly, the distances between taxon “B” and intermediate ancestor “Q”, as well as taxon “X” and its intermediate ancestor “R”, are calculated.
TABLE 22.3
A
B
X
P1 = (dAB + dAX – dBX)/2 = (20 + 30 – 26)/2 = 12
A
0
Q1 = (dAB + dBX – dAX)/2 = (20 + 26 – 30)/2 = 8
B
20
0
R1 = (dAX + dBX – dAB)/2 = (30 + 26 – 20)/2 = 18
X
30
26
0
The obtained distances (in the P1, Q1 and R1) are put in a tree:
Now “A” and “B” are combined as “AB” (as we have obtained the distances of the taxa from the respective intermediate ancestors).
The combined taxon “X” is expanded into its component taxa (here, “D” and “C”).
The immediate taxon (“C”) is considered a second taxon to estimate the distance with its intermediate ancestor, and the other taxa are again combined into taxon “X”, so that the same steps can be iterated.
The same notations are used as for the previous iteration. However, the subscripts are changed to “2”, – that is, P2 (distance between “AB” node with its intermediate ancestor, designated by “P” again), Q2 (distance between “C” node with its intermediate ancestor, designated as “Q” again) and R2 (distance between “X” node with its intermediate ancestor “R”).
TABLE 22.4
AB
C
D
AB
0
(AB)C = (dAC + dBC)/2 = (26 + 22)/2 = 24
C
24
0
(AB)D = (dAD + dBD)/2 = (34 + 30)/2 = 32
D (or X)
32
16
0
At this point, one additional parameter, internal branch length (IBL), is calculated for the combined taxa “AB”:
IBL Calculation
In the last step, no more additional information for calculating IBL between “D” and “ABC” is available. In this situation, the length of the internal branch (designated as “IBL2” for the second time calculation) is determined by the following formula:
The final tree constructed by the FM algorithm is as follows:
22.4 INTERPRETATION OF THE FM TREE
The phylogenetic tree has been constructed assuming a different rate of evolution among different branches (or taxa). The feature of additivity of the branches holds true to determine distances between any two OTUs.
22.5 QUESTIONS
1. Construct the phylogenetic tree using the FM method:
TABLE 22.5
A
B
C
D
A
0
B
10
0
C
14
15
0
D
24
18
11
0
2. In the last chapter, you constructed the phylogenetic tree using UPGMA (Q1a). Now construct the tree using the FM method and compare with the previous one.
TABLE 22.6
A
B
C
D
E
B
8
C
18
18
D
18
18
10
E
18
18
10
4
F
20
20
20
20
20
3. What is the meaning of the term “internal branch length”? How is it important in calculating the phylogenetic tree using the FM method?
4. Differentiate between the principle and applications of the FM and UPGMA methods of phylogenetic tree construction.