CHAPTER 26 Construction of Phylogenetic Tree Using MEGA7
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
26.1 INTRODUCTION
The Molecular Evolution and Genetic Analysis (MEGA) is a freely downloadable (for research and education) integrated tool for analyzing molecular data (nucleotide and protein sequences) and construction of phylogenetic trees. The latest version, MEGA 7.0.14, is used for some bio‐computational analyses, for example in: sequence alignment; determining the best evolutionary model; construction of phylogenetic trees as well as inferring ancestral sequences; mining online databases; estimation of divergence times; rates of molecular evolution; and testing evolutionary hypotheses. The software is freely available at http://www.megasoftware.net/ and can be run in Windows (both GUI and command‐line‐based), as well as in Linux and Mac operating systems (command‐line‐based).
In this chapter, we will see how a phylogenetic tree can be constructed using MEGA7 suit and inferred.
26.2 OBJECTIVE
To build a phylogenetic tree from a given set of molecular sequences.
26.3 PROCEDURE
26.3.1 Prepare the sequence file
Download and then arrange the molecular sequence data (nucleotide or amino acid sequences) in FASTA format, and save in a notepad (*.txt) file. It is not necessary that all the sequences should be of the same length, but the sequences should be homologous (depending on the hypothesis being tested in the experiment). The descriptive line of each FASTA formatted sequence may be shortened (Figure 26.1).
26.3.2 Uploading data file/pasting the sequences
Open MEGA7 and click on Align → Click on Edit/Build Alignment (the first option) in the drop‐down menu.
A small dialogue box will appear with the options in a radio button.
Create a new alignment: Select this if you are starting afresh. Selecting the option will direct the user to another window to select the type of input sequence data, DNA or amino acid. Select the correct option and proceed. Copy all the sequences (in FASTA format) and paste in the Alignment Explorer.
Open a saved alignment session: Select this if you have already saved a previous alignment (on which someone has worked earlier). Select the file from the folder and proceed.
Retrieve sequence from a file: Click if you want to upload a sequence from a text file. The text (.txt) file containing molecular sequences (in FASTA, PAUP, MEGA, ALN, Phylip, GCG, PIR, NBRF, MSF or IG formats) is opened in the sequence editor for further analysis (MSA).
26.3.3 Align the sequences
Click on “Alignment” on the menu bar and select any one of the two options in the drop‐down menu:
Align by ClustalW: Opt for ClustalW when the input sequences are of comparable length and homologous (Figure 26.2).
Align by Muscle: This option is preferred for sequences with considerably varying length, although belonging to the same super‐family. Out of these two algorithms, namely, ClustalW (progressive algorithm) and Muscle (iterative algorithm), the performance of Muscle is considerably good when the input sequences vary in sequence lengths.
26.3.4 Save session
The alignment session can be saved as a *.mas file for future use (Figure 26.3).
26.3.5 Export alignment
The alignment data can be exported in any of the following file formats: MEGA, FASTA, PAUP.
Now, close the alignment explorer window to proceed for phylogenetic analysis.
26.3.6 Phylogenetic tree construction
Open the main window of MEGA7 and click on the “Phylogeny” tab in the menu bar. Select the algorithm you need for phylogenetic analysis from the drop‐down menu. Here we will choose the option “Construct/Test Neighbor‐Joining Tree”.
26.3.7 Selection of tree construction parameters
Test of Phylogeny: Select “Bootstrap method” for re‐sampling of the branching pattern.
Number of Bootstrap replications: Run 500 re‐samplings if the sequence length is long and/or the number of sequences is higher; else consider 1000 bootstrap‐replications. At least 100 bootstrap re‐samplings are suggested for validating the branching of constructed tree.
Model/Method: The drop‐down menu displays a list of models (i.e., Number of differences, p‐distance, Poisson model, JTT, etc., depending on the algorithm chosen). It is better to run the program “Find Best DNA/Protein Models (ML)”, available under the “Models” tab in the menu bar (Figure 26.4). However, selection of model is time‐consuming, and is more applicable for the Maximum Likelihood‐based algorithm. We can, in general, select an advanced model such as Jones–Taylor–Thornton (JTT). Please remember that the NJ method assumes different rates of evolutionary changes, while the ME method assumes the same rate of transversion and transition. Thus, accordingly, select the model based on the method you opt for phylogenetic tree construction (Figure 26.5).
Rates among Sites: There are two options (for nucleotide sequences as input): “Gamma Distributed” and “Uniform rates”. Opt for Gamma distributed if sequences are divergent enough.
Gamma parameter: Gamma distribution is specified with Gamma parameter (or shape parameter varying from 1 to 5) for modeling the evolutionary rates. Here it is assumed that the substitution rate varies from site to site.
Gaps/Missing Data Treatment: complete deletion.
Now, run the analysis for tree construction, by clicking on “Compute”.
26.4 INTERPRETATION OF PHYLOGENETIC TREE
The phylogenetic tree displays the branch scale at the bottom.
Node IDs: Each of the internal nodes is given discrete and unique numerical IDs for specification.
Branch length: Each branch has a length (corresponding to the scale given at the bottom) that indicates the substitution of residues.
Bootstrap value: This indicates the stability of the branching pattern but bears no relationship to the accuracy of the tree.
26.4.1 Controlling the output of phylogenetic tree
The generated tree can be manipulated to suit the requirement of a presentation by changing its size, branch positions, toggling the bootstrap values, branch length, etc. (Figure 26.6). Since MEGA is very user‐friendly software, everything can be controlled through the menu‐bar options, or the buttons displayed in the left‐hand side pane (Windows OS). Figure 26.7 clearly indicates the various buttons on the GUI for controlling the appearance of the tree.
26.4.2 Diagrams for each of the taxa
These can be inserted as follows:
Click on “Subtree” in Menu‐bar → Select Use Subtree draw options → Click on the “Image” tab of Subtree Drawing options” and select the image from the saved image in the particular folder (Figure 26.8).
Save the Phylogenetic Tree: Click on the “Image” option in the menu bar → Click on “Save as PNG file” (Figure 26.9).
26.5 QUESTIONS
1. Construct a phylogenetic tree using the neighbor‐joining method, with bootstrap re‐sampling of 500, using a set of homologous protein sequences.
2. Consider the previous example and increase the bootstrap re‐sampling to 1000. Is there any change in the branching pattern reliability values (i.e., bootstrap values)? Display the tree so that only bootstrap values of more than 75 are shown in the nodes.
3. Construct a phylogenetic tree with the following algorithms: ME, NJ, UPGMA, Maximum Likelihood. Now, compare the trees using the protein sequences: NP001272506.1 AAI20478.1 CAH23217.1 XP005909397.1 XP005955229.1. The bootstrap re‐sampling should be 500 for all the algorithms. Please determine the best evolutionary model before running the phylogeny analysis.
4. Determine the best model for phylogenetic tree construction using the following nucleotide sequences, and then construct a circular phylogenetic tree with bootstrap re‐sampling and minimum evolution algorithm: AB974690.1 AB973433.1 NM001009772.1 NM001009406.1 NM001009787.1 NM001285577.1
5. Interpret the given output generated by MEGA using the NJ method: