Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 23
Construction of Phylogenetic Tree: Neighbor‐Joining Method

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

23.1 INTRODUCTION

The term “neighbors” refers to a node‐pair which is separated by another node. The NJ method (Saitou and Nei, 1987) is a particular case of the “star decomposition method”, where raw data are arranged in a distance matrix and nodes are created (see the example below) whereas, in the NJ method, the separation of nodes is adjusted by average divergence from all other nodes.

23.1.1 Principle

The principle of the neighbor‐joining (NJ) technique is minimum evolution, which selects the tree with minimum branch‐length. It is based on a very fast, greedy heuristic algorithm that generates sub‐trees, and the closest sub‐trees are joined to each other to yield the final tree, in a step‐wise manner. The total branch length is the shortest for the true tree.

The NJ method can be applied for large datasets relating to the taxa with varying degrees of divergence (hence, the tree will show different lengths for different branches).
Multiple substitutions can be corrected.
Some of the sequence information is lost in the NJ method due to the nature of the algorithm.

23.1.2 Assumptions

Minimum mutational events explain the evolution of the molecular sequences.
The branch length of the tree with known topology represents the different rate of evolutionary changes.

23.2 OBJECTIVE

To construct a phylogenetic tree, using the neighbor‐joining method, employing a distance matrix obtained from a set of molecular sequences.

23.3 PROCEDURE

The essential points to remember are:

No preference is exercised in the pairing of sequences.
It searches to find the pair of sequences that minimizes the branch length.
The NJ method uses the Fitch–Margoliash Algorithm to create a rate corrected new distance table.

23.3.1 Start with a distance matrix

Let us consider a distance matrix of five sequences (A, B, C, D and E)

TABLE 23.1

Distance matrix	A	B	C	D
B	4.000
C	7.000	8.000
D	6.000	7.000	6.000
E	8.000	9.000	10.000	9.000

23.3.2 Construction of a star tree

A star tree is first drawn, based on the number of input sequences (OTUs). This star tree is a random tree with a central hub that joins all the branches.

A star tree illustrated by five dashed lines with a common point. The lines are labeled A, B, C, D and E. — **FIGURE 23.1**

23.3.3 Calculation of net divergence

Net divergence (V_i) is calculated for each of the OTUs using the formula images , where, i ≠ j and i, j = 1, 2, …, 5, i.e., the number of OTUs incorporated.

TABLE 23.2

Net divergence	Equations	Value
V_A =	d_AB + d_AC + d_AD + d_AE =	25.000
V_B =	d_AB + d_BC + d_BD + d_BE =	28.000
V_C =	d_AC + d_BC + d_CD + d_CE =	31.000
V_D =	d_AD + d_BD + d_CD + d_DE =	28.000
V_E =	d_AE + d_BE + d_CE + d_DE =	36.000

23.3.4 Calculation of new distance values from the original distance and net divergence

The mean divergence (M_i) is calculated by dividing individual net divergence (V_i) by (N – 2), where N is the number of OTUs.

TABLE 23.3

M values	Equation	Calculation	Value
M_A =	V_A / (N – 2) =	25 / (5 – 2) =	8.333
M_B =	V_B / (N – 2) =	28 / (5 – 2) =	9.333
M_C =	V_C / (N – 2) =	31 / (5 – 2) =	10.333
M_D =	V_D / (N – 2) =	28 / (5 – 2) =	9.333
M_E =	V_E / (N – 2) =	36 / (5 – 2) =	12.000

23.3.5 Calculation of new distances

New distances (n_ij) are calculated by subtracting the mean divergence (M_i, M_j) of the two OTUs which are being studied from the distance (d_ij) between these two corresponding OTUs (i^th and j^th OTUs being studied).

TABLE 23.4

Distance	Equation	Calculation	Value
n_AB =	d_AB – (M_A + M_B) =	4 – (8.333 + 9.333) =	– 13.666
n_AC =	d_AC – (M_A + M_C) =	7 – (8.333 + 10.333) =	– 11.666
n_AD =	d_AD – (M_A + M_D) =	6 – (8.333 + 9.333) =	– 11.666
n_AE =	d_AE – (M_A + M_E) =	8 – (8.333 + 12.000) =	– 12.333
n_BC =	d_BC – (M_B + M_C) =	8 – (9.333 + 10.333) =	– 11.666
n_BD =	d_BD – (M_B + M_D) =	7 – (9.333 + 9.333) =	– 11.666
n_BE =	d_BE – (M_B + M_E) =	9 – (9.333 + 12.000) =	– 12.333
n_CD =	d_CD – (M_C + M_D) =	6 – (10.333 + 9.333) =	– 13.666
n_CE =	d_CE – (M_C + M_E) =	10 – (10.333 + 12.000) =	– 12.333
n_DE =	d_DE – (M_D + M_E) =	9 – (9.333 + 12.000) =	– 12.333

23.3.6 Construction of new distance matrix from the new distance values (n_ij)

TABLE 23.5

	A	B	C	D
B	– 13.666
C	– 11.666	– 11.666
D	– 11.666	– 11.666	– 13.666
E	– 12.333	– 12.333	– 12.333	– 12.333

Here, there are two pairs of OTUs – namely, “A”, “B” and “C”, “D” exhibiting least divergence with value n_ij = –13.666 (bold). We can select any one of these two pairs. In this example, n_CD (shown in bold) is selected.

23.3.7 Calculation of branch length of the internal node

First, the least distance value is identified from the new distance matrix.
The two taxa with the minimum n_ij distance value are taken as neighbors. In the present example, “C” and “D” are the neighbors, with the minimum n_ij value of –13.666.
Now, let us assume that “X” is the new node between the neighbors “C” and “D”. The branch lengths from the internal node “X” and the external nodes “C” and “D” (denoted as L_CX and L_DX, respectively) are calculated.

TABLE 23.6

Branch length	Equation	Calculation	Value
L_CX =	(d_CD / 2) + ((M_C – M_D) / (2)) =	(6 / 2) + ((10.333 – 9.333) / (2))	=3.500
L_DX =	d_CD – L_CX =	(6 – 3.500)	=2.500

These values of the branch lengths are now used to construct the tree:

A star tree composing 3 dashed lines (labeled A, B, and C) connected to a line (node X) with two branches (depicted by the solid lines) labeled C and D. — **FIGURE 23.2**

23.3.8 Distance of the other OTUs from internal node (X)

Next, the distances between rest of the terminal nodes (here, “A”, “B” and “E”) with the internal node (“X”) are calculated:

TABLE 23.7

Equation	Calculation	Value
m_AX =(d_AC + d_AD – d_CD) / 2	= (7 + 6 – 6) / 2	= 3.500
m_BX =(d_BC + d_BD – d_CD) / 2	= (8 + 7 – 6) / 2	= 4.500
m_EX =(d_CE + d_DE – d_CD) / 2	= (10 + 9 – 6) / 2	= 6.500

Now the first iteration is over, the number of OTUs has been reduced to four (N – 1), where the OTUs “C” and “D” have been merged into “X”. The second iteration will start with the same steps as the first iteration.

23.3.9 New distance matrix to start the second iteration

TABLE 23.8

Distance matrix	A	B	X
B	4.000
X	3.500	4.500
E	8.000	9.000	6.500

CROSS‐CHECKING

The values of L can be alternatively determined by simply placing the distance between the nodes from the second OTU as the first one to be calculated. Hence, any one of the neighboring OTUs can be considered as the first OTU to be calculated.

Branch length	Equation	Calculation	Value
LDX =	(dCD / 2) + ((MD – MC) / (2)) =	(6 / 2) + ((9.333 – 10.333) / (2)) =	2.500
LCX =	dCD – LDX =	(6 – 2.500) =	3.500

23.3.10 Calculation of net divergence

TABLE 23.9

Net Div.	Equations	Value
V_A =	d_AB + d_AX + d_AE	=15.500
V_B =	d_AB + d_BX + d_BE	=17.500
V_X =	d_AX + d_BX + d_EX	=14.500
V_E =	d_AE + d_BE + d_EX	=23.500

23.3.11 Calculation of new distance values from the original distance and net divergence

TABLE 23.10

M values	Equation	Calculation	Value
M_A =	V_A / (N – 2) =	13.5 / (4 – 2) =	6.750
M_B =	V_B / (N – 2) =	17.5 / (4 – 2) =	8.750
M_X =	V_X / (N – 2) =	14.5 / (4 – 2) =	7.250
M_E =	V_E / (N – 2) =	21.5 / (4 – 2) =	10.750

23.3.12 Calculation of new distances

TABLE 23.11

Distance	Equation	Calculation	Value
n_AB =	d_AB – (M_A + M_B) =	4 – (6.75 + 8.75) =	– 11.500
n_AX =	d_AX – (M_A + M_X) =	3.5 – (6.75 + 7.25) =	– 10.500
n_AE =	d_AE – (M_A + M_E) =	8 – (6.75 + 10.75) =	– 9.500
n_BX =	d_BX – (M_B + M_X) =	4.5 – (8.75 + 7.25) =	– 11.500
n_BE =	d_BE – (M_B + M_E) =	9 – (8.75 + 10.75) =	– 10.500
n_XE =	d_XE – (M_X + M_E) =	6.5 – (7.25 + 10.75) =	– 11.500

23.3.13 Construction of new distance matrix from new distance values (n_ij)

Now we select A and B as the neighbors out of the three pairs with minimum distances.

TABLE 23.12

	A	B	X
B	– 11.500
X	– 10.500	– 11.500
E	– 9.500	– 10.500	– 11.500

23.3.14 Calculation of the branch length of the internal node

The two taxa with the minimum n_ij distance value are taken as neighbors. In the present example, “A” and “B” are the neighbors with a minimum n_ij value of –11.500.
Now, let us assume that “Y” is the new node between the neighbors “A” and “B”. The branch lengths from the internal node “Y” and the external nodes “A” and “B” (denoted as L_AY and L_BY, respectively) are calculated.

Distance with the internal node (Y):

TABLE 23.13

Branch Length	Equation	Value
L_AY =	(d_AB / 2) + ((V_A – V_B) / (2))	=1.000
L_BY =	d_AB – L_AY	=3.000

Next, the distances between the rest of the terminal nodes (here, “E”, “X”) with the internal node (“Y”) is calculated:

TABLE 23.14

Equation	Calculation	Value
m_EY =(d_AE + d_BE – d_AB) / 2 =	(8 + 9 – 4) / 2 =	6.500
m_XY =(d_AX + d_BX – d_AB) / 2 =	(3.5 + 4.5 – 4) / 2 =	2.000

Tree diagram composing a dashed line (labeled E) connected to a line with two nodes (X and Y). Each node has two branches, C and D and A and B, respectively. — **FIGURE 23.3**

Now, a new distance matrix is created in the third iteration (n = 3):

TABLE 23.15

Distance matrix	Y	X
X	2.000
E	6.500	6.500

23.3.15 Calculation of net divergence

TABLE 23.16

Net div.	Equations	Value
V_E	= d_EX + d_EY	13.000
V_X	= d_EX + d_XY	8.500
V_Y	= d_EY + d_XY	8.500

23.3.16 Calculation of new distance values from the original distance and net divergence

TABLE 23.17

M values	Equation	Calculation	Value
M_E	= V_E / (N – 2)	13 / (3 – 2) =	13.000
M_X	= V_X / (N – 2)	8.5 / (3 – 2) =	8.500
M_Y	= V_Y / (N – 2)	8.5 / (3 – 2) =	8.500

23.3.17 Calculation of new distances

TABLE 23.18

Distance	Equation	Calculation	Value
n_EX =	d_EX − (M_E + M_X) =	6.5 − (13 + 8.5) =	−15.000
n_EY =	d_EY − (M_E + M_Y) =	6.5 − (13 + 8.5) =	−15.000
n_XY =	d_XY − (M_X + M_Y) =	2 − (8.5 + 8.5) =	−15.000

23.3.18 Construction of new distance matrix from the new distance values (n_ij)

TABLE 23.19

New distance matrix	X	Y
Y	–15.000
E	–15.000	–15.000

23.3.19 Calculation of branch length of the internal node

The two taxa with the minimum n_ij distance values are taken as neighbors. In the present example, “E” and “Y” are the neighbors with a minimum n_ij value of –15.000.
Now, let us assume that “Z” is the new node between the neighbors “E” and “Y”. The branch lengths from the internal node “Z” and the external nodes “E” and “Y” (denoted as L_EZ and L_YZ, respectively) are calculated.

TABLE 23.20

Branch length	Equation	Calculation
L_EZ =	(d_EY / 2) + ((V_E – V_Y) / (2)) =	5.500
L_YZ =	d_EY – L_EZ =	1.000

23.3.20 Distance with internal node (Z)

Next, the distances between the rest of the terminal nodes (here, “E”, “Y”) with the internal node (“Z”) are calculated:

TABLE 23.21

Equation	Calculation	Value
m_XZ =(d_XE + d_YX – d_EY) / 2 =	((6.5) + (2) – (6.5)) / 2 =	1.000

The final neighbor‐joining tree obtained is shown below:

Neighbor-joining tree composing a line with three nodes (X, Y, and Z). Nodes X and Y have two branches, C and D and A and B, respectively. From node Z extends a line labeled E. — **FIGURE 23.4**

The rooted tree can be obtained by positioning the branches in a rectangular format of branches:

Tree diagram with branches, in rectangular form, labeled 1, 1A, 1B, 1C, 1D, and E. — **FIGURE 23.5**

Thus, trees are generated for different topologies, and the tree with the shortest total length is to be selected, as a principle.

23.4 INTERPRETATION OF NJ TREE

The values shown in each branch denote the rate‐corrected distances, which are not proportional to the time‐scale. The inference is thus that the rate of evolution is not the same in different branches, and the distances between two taxa are the sum of the distances indicated in the branches. The distances are calculated using a suitable method (assuming a different rate of substitution).

23.5 QUESTIONS

1. Construct a phylogenetic tree using the given distance matrix by applying the NJ method:
TABLE 23.22
Distance matrix A B C D
B 8.000
C 7.000 8.000
D 6.000 12.000 6.000
E 4.000 7.000 5.000 7.000
2. Use the following distance matrix to construct an NJ method‐based phylogenetic tree:
TABLE 23.23
Distance matrix A B C D E
B 5.000
C 4.000 8.000
D 7.000 10.000 7.000
E 6.000 9.000 6.000 6.000
F 7.000 12.000 8.000 9.000 7.000
3. Given the following distances, construct a phylogenetic tree using the NJ method:
TABLE 23.24
Distance matrix A B C D
B 3.000
C 5.000 8.000
D 6.000 7.000 6.000
E 7.000 10.000 12.000 8.000
4. How is the NJ method principally different from the UPGMA method of phylogeny construction?
5. Enumerate the conditions where the NJ method is the best suited for phylogenetic tree construction. Logically explain the reasons why.

Distance matrix	A	B	C	D
B	8.000
C	7.000	8.000
D	6.000	12.000	6.000
E	4.000	7.000	5.000	7.000

Distance matrix	A	B	C	D	E
B	5.000
C	4.000	8.000
D	7.000	10.000	7.000
E	6.000	9.000	6.000	6.000
F	7.000	12.000	8.000	9.000	7.000

Distance matrix	A	B	C	D
B	3.000
C	5.000	8.000
D	6.000	7.000	6.000
E	7.000	10.000	12.000	8.000

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.