CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
The term “neighbors” refers to a node‐pair which is separated by another node. The NJ method (Saitou and Nei, 1987) is a particular case of the “star decomposition method”, where raw data are arranged in a distance matrix and nodes are created (see the example below) whereas, in the NJ method, the separation of nodes is adjusted by average divergence from all other nodes.
The principle of the neighbor‐joining (NJ) technique is minimum evolution, which selects the tree with minimum branch‐length. It is based on a very fast, greedy heuristic algorithm that generates sub‐trees, and the closest sub‐trees are joined to each other to yield the final tree, in a step‐wise manner. The total branch length is the shortest for the true tree.
To construct a phylogenetic tree, using the neighbor‐joining method, employing a distance matrix obtained from a set of molecular sequences.
The essential points to remember are:
Let us consider a distance matrix of five sequences (A, B, C, D and E)
A star tree is first drawn, based on the number of input sequences (OTUs). This star tree is a random tree with a central hub that joins all the branches.
Net divergence (Vi) is calculated for each of the OTUs using the formula , where, i ≠ j and i, j = 1, 2, …, 5, i.e., the number of OTUs incorporated.
The mean divergence (Mi) is calculated by dividing individual net divergence (Vi) by (N – 2), where N is the number of OTUs.
New distances (nij) are calculated by subtracting the mean divergence (Mi, Mj) of the two OTUs which are being studied from the distance (dij) between these two corresponding OTUs (ith and jth OTUs being studied).
Distance | Equation | Calculation | Value |
nAB = | dAB – (MA + MB) = | 4 – (8.333 + 9.333) = | – 13.666 |
nAC = | dAC – (MA + MC) = | 7 – (8.333 + 10.333) = | – 11.666 |
nAD = | dAD – (MA + MD) = | 6 – (8.333 + 9.333) = | – 11.666 |
nAE = | dAE – (MA + ME) = | 8 – (8.333 + 12.000) = | – 12.333 |
nBC = | dBC – (MB + MC) = | 8 – (9.333 + 10.333) = | – 11.666 |
nBD = | dBD – (MB + MD) = | 7 – (9.333 + 9.333) = | – 11.666 |
nBE = | dBE – (MB + ME) = | 9 – (9.333 + 12.000) = | – 12.333 |
nCD = | dCD – (MC + MD) = | 6 – (10.333 + 9.333) = | – 13.666 |
nCE = | dCE – (MC + ME) = | 10 – (10.333 + 12.000) = | – 12.333 |
nDE = | dDE – (MD + ME) = | 9 – (9.333 + 12.000) = | – 12.333 |
A | B | C | D | |
B | – 13.666 | |||
C | – 11.666 | – 11.666 | ||
D | – 11.666 | – 11.666 | – 13.666 | |
E | – 12.333 | – 12.333 | – 12.333 | – 12.333 |
Here, there are two pairs of OTUs – namely, “A”, “B” and “C”, “D” exhibiting least divergence with value nij = –13.666 (bold). We can select any one of these two pairs. In this example, nCD (shown in bold) is selected.
Branch length | Equation | Calculation | Value |
LCX = | (dCD / 2) + ((MC – MD) / (2)) = | (6 / 2) + ((10.333 – 9.333) / (2)) | =3.500 |
LDX = | dCD – LCX = | (6 – 3.500) | =2.500 |
These values of the branch lengths are now used to construct the tree:
Next, the distances between rest of the terminal nodes (here, “A”, “B” and “E”) with the internal node (“X”) are calculated:
Equation | Calculation | Value |
mAX =(dAC + dAD – dCD) / 2 | = (7 + 6 – 6) / 2 | = 3.500 |
mBX =(dBC + dBD – dCD) / 2 | = (8 + 7 – 6) / 2 | = 4.500 |
mEX =(dCE + dDE – dCD) / 2 | = (10 + 9 – 6) / 2 | = 6.500 |
Now the first iteration is over, the number of OTUs has been reduced to four (N – 1), where the OTUs “C” and “D” have been merged into “X”. The second iteration will start with the same steps as the first iteration.
Distance | Equation | Calculation | Value |
nAB = | dAB – (MA + MB) = | 4 – (6.75 + 8.75) = | – 11.500 |
nAX = | dAX – (MA + MX) = | 3.5 – (6.75 + 7.25) = | – 10.500 |
nAE = | dAE – (MA + ME) = | 8 – (6.75 + 10.75) = | – 9.500 |
nBX = | dBX – (MB + MX) = | 4.5 – (8.75 + 7.25) = | – 11.500 |
nBE = | dBE – (MB + ME) = | 9 – (8.75 + 10.75) = | – 10.500 |
nXE = | dXE – (MX + ME) = | 6.5 – (7.25 + 10.75) = | – 11.500 |
Now we select A and B as the neighbors out of the three pairs with minimum distances.
Distance with the internal node (Y):
Branch Length | Equation | Value |
LAY = | (dAB / 2) + ((VA – VB) / (2)) | =1.000 |
LBY = | dAB – LAY | =3.000 |
Next, the distances between the rest of the terminal nodes (here, “E”, “X”) with the internal node (“Y”) is calculated:
Equation | Calculation | Value |
mEY =(dAE + dBE – dAB) / 2 = | (8 + 9 – 4) / 2 = | 6.500 |
mXY =(dAX + dBX – dAB) / 2 = | (3.5 + 4.5 – 4) / 2 = | 2.000 |
Now, a new distance matrix is created in the third iteration (n = 3):
Next, the distances between the rest of the terminal nodes (here, “E”, “Y”) with the internal node (“Z”) are calculated:
Equation | Calculation | Value |
mXZ =(dXE + dYX – dEY) / 2 = | ((6.5) + (2) – (6.5)) / 2 = | 1.000 |
The final neighbor‐joining tree obtained is shown below:
The rooted tree can be obtained by positioning the branches in a rectangular format of branches:
Thus, trees are generated for different topologies, and the tree with the shortest total length is to be selected, as a principle.
The values shown in each branch denote the rate‐corrected distances, which are not proportional to the time‐scale. The inference is thus that the rate of evolution is not the same in different branches, and the distances between two taxa are the sum of the distances indicated in the branches. The distances are calculated using a suitable method (assuming a different rate of substitution).
3.138.135.178