CHAPTER 11
Sequence Alignment Using Online Tools

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

11.1 INTRODUCTION

Algorithms used to do pairwise or multiple sequence alignments vary with the sequence alignment tool available online. Links to some useful sites for sequence alignment are given below:

  1. Color INteractive Editor for Multiple Alignments (CINEMA 2.1). CINEMA (http://www.bioinf.man.ac.uk/dbbrowser/CINEMA2.1/) is freely available online. The sequence alignment is supported with a color editor.
  2. Multiple Alignment Construction and Analysis Workbench (MACAW) (http://en.bio‐soft.net/format/MACAW.html). This is downloadable software that is used to identify localized sequence similarities and edits blocks of multiple sequences.
  3. Java ALignment VIEWer (JALVIEW) (http://www.jalview.org/). JALVIEW has freely accessible multiple alignment editors. Alignment tools like “EBI ClustalW” and protein domain database “Pfam” use this Java‐based platform.
  4. Clustal W (http://www.ebi.ac.uk/Tools/clustalw2/index.html). Used for pairwise and multiple sequence alignment.
  5. Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/). Clustal Omega is capable of handling several thousands of medium‐to‐large sized sequences simultaneously (Sievers and Higgins, 2014).
  6. Multiple Alignment using Fast Fourier Transform (MAFFT version 6) (http://mafft.cbrc.jp/alignment/software/index.html). The multiple sequence alignment program MAFFT is available in both online and downloadable forms (Katoh et al., 2002). Several multiple alignment methods are available in MAFFT: L‐INS‐i (accurate; for alignment of ≤ 200 sequences), FFT‐NS‐2 (fast; for alignment of ≤ 10 000 sequences).
  7. Multiple Sequence Comparison by Log‐Expectation (MUSCLE) (http://www.ebi.ac.uk/tools/msa/muscle/). The quality of MSA yielded by MUSCLE is better than Clustal, and the algorithm is faster for larger alignments (Edgar, 2004). The user guide for MUSCLE is available at http://www.genebee.msu.su/muscle/help.html.
  8. Tree‐based Consistency objective function for alignment evaluation (T‐Coffee) (http://www.ebi.ac.uk/Tools/msa/tcoffee/). This multiple sequence alignment program has been developed by Cedric Notredame of CRG Centro de Regulacio Genomica (Barcelona) (Notredame et al., 2000). T‐Coffee has been recommended as a very efficient multiple sequence aligner that outputs extra information on structural and evolutionary perspectives (Magis et al., 2014). T‐Coffee accepts sequences in PIR and FASTA format, and the default output format is Clustal. Being progressive alignment software, it generates a library of pair‐wise alignments to direct the MSA. It can identify motifs and can also evaluate the alignment quality.

11.2 OBJECTIVE

To align multiple amino acid sequences of a given protein (SRY) using the online Clustal alignment program.

11.3 PROCEDURE

  1. Open the Clustal Omega home page (http://www.ebi.ac.uk/Tools/msa/clustalo/). If one browser is not working due to incompatibility, the users may switch to a suitable browser.
  2. Input sequences: The input sequences (let us use the nucleotide sequences: NCBI accession numbers AFG33955, ABV44686, AAW23363, ABS82755, AAG34436, AAG34440, AAG34393, AAB58342, AAL09287, AAL09284) can be either pasted in FASTA format in the sequence box or saved in one of the specific input formats (e.g., FASTA) in a text (*.txt) file. This file can be uploaded by clicking the “Upload a file” button. The input sequences for MSA can be of one of the following formats: NBRF/PIR, FASTA, EMBL/Swiss‐Prot, Clustal, GCC/MSF, GCG, RSF, or GDE. The program yields output in any one of the following formats: PHYLIP, Clustal, GCG/MSF, NBRF/PIR, GDE, or NEXUS.
  3. Specify the type of input sequences: Select the sequence type – “Protein”, “DNA” or “RNA” – from the drop‐down list in “Step 1 – Enter your input sequences”.
  4. Parameters: Clustal Omega makes use of seeded guide‐trees and HMM profile‐profile progressive alignments for multiple sequence alignment. Parameters are available in the section “Step 2 – Set your parameters”. These enable the user to modify the MSA according to requirements:
    1. Dealign Input Sequences: Select “Yes” from the drop‐down options to remove gaps present in the input sequences if these have been entered as “already aligned”. The default value is “no”.
    2. Output Alignment Format: The user can select any one of the six output formats (PHYLIP, Clustal, GCG/MSF, NBRF/PIR, GDE, or NEXUS). The default format is “Clustal”.
    3. mBed‐like Clustering Guide‐tree: the mBed is a sampling method to accelerate the calculations for constructing a guide tree. The default option “yes” instructs the program to generate guide trees from the input sequences. It converts the sequences into vectors of distances, and then clusters the vectors using k‐means (a clustering method for partitioning “n” number of observations into “k” number of clusters). Each of the “k” clusters is further clustered using a simple hierarchical clustering method UPGMA (Unweighted Pair Group Method with Arithmetic mean) to construct phenograms (diagrammatic representation of taxonomic relationship among organisms). Finally, the sub‐clusters are joined to create a tree.
    4. mBed‐like Clustering Iteration: Select “Yes” (default is “No”) to imply mBed‐like clustering during subsequent iterations.
    5. Number of Combined Iterations: Total number of iterations that include a guide tree (for constructing phenograms) using a hidden Markov model (for aligning multiple sequences). The user can increase the “default (0)” up to five combined iterations if the software fails to generate logically acceptable alignment or to construct the guide tree.
    6. Max Guide Tree Iterations: This refers to the number of iterations for the guide tree (generating phenogram) only, after the user has set the number of combined iterations. The default value is “default”.
    7. Max HMM Iterations: Similar to the above, this restricts the number of iterations for the hidden Markov model for alignment of sequences.
  5. Job submission: Click the “Submit” button to get the alignment and associated results (e.g., guide tree, the distance between sequences). Alternatively, MSA results can be obtained through email as specified by the user, after checking the box “be notified by email” (Figure 11.1).
Image described by caption.

FIGURE 11.1 The output of multiple sequence alignment using Clustal Omega is obtained in different tabs – “Alignments”, “Result Summary”, “Submission Details”. Jalview is the Java alignment viewer that displays the alignment, along with the consensus sequence.

11.4 INTERPRETATION OF RESULTS

The alignment results are displayed in Clustal format as rows of interleaved sequences. Gaps are introduced to show insertion‐deletion (InDel). At the bottom of each block of sequences, one line of symbols indicates the matches and conserved residues:

  1. “*” indicates match for all the residues in the same column;
  2. “:” indicates conserved substitution observed;
  3. “.” indicates semi‐conserved substitution observed;
  4. No symbol, blank space: indicates a mismatch.

11.5 COLOR SCHEME FOR AMINO ACID RESIDUES

Amino acids with similar physicochemical properties are shown in the same color (http://www.hhmi.umbc.edu/toolkit/ClustalWGuide.html):

  1. Red: small, hydrophobic, aromatic, not Y (A, V, F, P, M, I, L, W).
  2. Blue: acidic (D, E).
  3. Magenta: basic (R, H, K).
  4. Green: hydroxyl, amine, amide, basic (S, T, Y, H, C, N, G,Q).
  5. Gray: others.

11.6 QUESTIONS

  1. 1. Align the following nucleotide sequences and find the conserved domain(s):NM_005217.3, NM_004084.3, X52053.1, M21130.1, BC119706.2
  2. 2. Align the given sequences and show the overall alignment as graphical view (overview window) using online tool MAFFT (http://mafft.cbrc.jp/alignment/server/index.html): ABQ72077.1, AET17647.1, AEM98800.1, CCC62950.1, AEJ49160.1.
  3. 3. Align the given sequences using ClustalW and T‐Coffee, and logically justify which program has given the more reliable results: KF469208.1, D73408.1, AM933377.1, XM003473564.2, XM004440021.2, AY970684.1, XM004680883.1, AF227738.1, DQ372924.1, AF231714.1, XM004049428.1, AY826184.1, NM001105535.2, NM010776.1
  4. 4. Align the given sequences and comment on the conserved patterns found: XM004087588.2, XM004286285.1, NM_001009005.2, NM001161885.1, NM_001009005.2, KF469209.1, NM001141497.1, XM004607626.1, KF469210.1
  5. 5. Specify how you will modify the Clustal Omega parameters to obtain an optimum alignment for distantly related sequences.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.34.218