CHAPTER 21 Construction of Phylogenetic Tree: Unweighted‐Pair Group Method with Arithmetic Mean (UPGMA)
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
21.1 INTRODUCTION
UPGMA is a clustering algorithm that works by joining the branches of a tree on the basis of maximum similarity criteria among pairs of sequences, and by calculating the means of joined pairs. UPGMA is “ultrametric”, so all the terminal nodes are equally distanced from the root. Hence, at the end, when a root is added, the rooted tree is produced.
Unweighted: It indicates equal contribution of all the pair‐wise distances. There is no weighting of any specific taxa‐pairs to indicate a different evolutionary rate compared with another pair(s). This is the opposite of the Weighted‐Pair Group Method with Arithmetic mean (WPGMA).
Pair‐groups: Any two taxa or any two clusters (clade) or one taxon and a cluster are always combined in pairs (that is, interpreted as dichotomies).
Arithmetic mean: Pair‐wise distance of each group is the mean distance to all members of that group.
21.2 ASSUMPTIONS
Constant rate of evolution (i.e., mutation‐rate) amongst all the sequences.
Distance data are ultrametric: This enables clustering by satisfying the “three point condition” to generate the tree.
21.3 OBJECTIVE
To construct a phylogenetic tree (dendrogram), using the UPGMA method, from a set of molecular sequences.
21.4 PROCEDURE
Calculate the raw pair‐wise distance data from a set of sequences and construct a distance matrix:
TABLE 21.1
A
B
C
D
B
3
C
7
7
D
10
10
10
E
10
10
10
8
Note: while constructing the tree from the distance values, one needs to select the closest pair (with minimum distance among all possible pairs) from the distance matrix, and then merge these two objects to yield one.
Identify the least‐distant pair: Here, the minimum distance is d(AB) = 3 (i.e., between the taxa “A” and “B”).
Place these two taxa in a single group as a cluster and consider the duo as a single external node.
The distance is d(AB) = AB = 3. Hence, the depth of divergence (for each branch) of this sub‐tree will be 3/2 = 1.5 units.
Now consider “AB” as a single taxon and repeat the same steps, as above. Find the shortest distance and make the respective taxa a single cluster.
TABLE 21.2
AB
C
D
C
7
D
10
10
E
10
10
8
Add the new taxa (C) with “AB” cluster, since “AB” and “C” have the least distance (i.e., 7), to produce the sub‐tree of ABC, in which the AB sub‐tree (drawn just in the last step) is attached to the point M. The length of AK + AN = OC.
For the sake of understanding, the internal nodes of the branches have been marked (as K, L, N, etc.). These are not required for tree construction in general.
Now, repeat the last two steps, – that is, calculate the mean distance between (AB)C cluster and Sequence “D” and then draw the phylogenetic tree:
TABLE 21.3
ABC
D
D
10
E
10
8
Now, the least distance is DE = 8. We will repeat the same steps as before: The branch length will be 8/2 = 4 units.
Now, calculate the distance between these two clusters ABC and DE:
21.5 INTERPRETATION OF UPGMA TREE
The distance from the root to the OUT of each cluster = 10/2 = 5 units.
Hence, the distance of TP = 5 – OC = 5 – 3.5 = 1.5 units.
The distance of US = 5 – 4 = 1.0 unit.
The UPGMA tree obtained in our example depicts evolutionary distances between the taxa. We need to add up the distances connecting these two taxa to calculate the distance between any two taxa (“A” and “D”): 1.5 + 2 + 1.5 + 1 + 4 = 10. This is exactly the value given in the distance table. Assuming equal evolutionary rates, these values indicate the evolutionary distances between the taxa.
21.6 QUESTIONS
1. Draw a phylogenetic tree manually using the following distance matrices:
TABLE 21.4
A
B
C
D
E
B
8
C
18
18
D
18
18
10
E
18
18
10
4
F
20
20
20
20
20
TABLE 21.5
A
B
C
D
E
A
0
B
4
0
C
8
8
0
D
8
8
6
0
E
8
8
6
2
0
2. What are the merits and demerits of the UPGMA method of phylogenetic tree construction?
3. Explain in detail why ultrametric data are needed for UPGMA tree construction.
4. Under what circumstances do we prefer the UPGMA tree? How do you interpret the results of the UPGMA tree?
5. Suppose we have morphological data from which similarity and distance matrices can be constructed. Can we use such a distance matrix for the construction of a UPGMA tree? Justify your answer.