CHAPTER 37
Genome Annotation in Eukaryotes

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

37.1 INTRODUCTION

GENSCAN, an HMM algorithm‐based online program, is used to identify complete gene structures in genomic DNA, and to predict the location of genes and their exon–intron boundaries in genomic sequences of vertebrates, Arabidopsis and maize. GENSCAN was developed by Christopher Burge of the Department of Mathematics, Stanford University (Burge and Karlin, 1997; Burge, 1998).

37.2 OBJECTIVE

To predict the putative gene sequence(s) in a given input nucleotide sequence and annotate the sequence.

37.3 PROCEDURE

  1. Download a sequence (fewer than 1 million base pairs) from NCBI Nucleotide, and save in Notepad in FASTA format: here, chromosome 1 (CM000409.1) sequence of duck‐billed platypus (Ornithorhynchus anatinus) has been downloaded from NCBI (http://www.ncbi.nlm.nih.gov/nuccore/CM000409.1).
  2. The original sequence is more than 1 megabase in size, so it needs to be trimmed from any termini to approximately 1 megabase in size (using Notepad ++). The user needs to subject the input sequence to repeat‐masker to remove low‐complexity, repeat regions in the input sequence.
  3. Open the GENSCAN web server: http://genes.mit.edu/GENSCAN.html.
  4. Set the parameters:
    1. Organism: select the appropriate option from “Vertebrate”, “Arabidopsis”, or “Maize”, available in the drop‐down options with “Organism”. Here, we will select “Vertebrate”.
    2. Suboptimal exon cutoff: values ranging from 0.01 to 1.00. This is the probability value of finding the exon of a gene, and is an optional parameter which, by default, is set to 1.00. It can be reduced; however, the reliability of predicted exons is also reduced. The probability should not be reduced below 0.50.
    3. Sequence name: A text box is provided to type the name of the sequence. This is also optional, and is used to name the sequence for ease of identification.
    4. Print options: presents two output or result options: “Predicted peptides only” and “Predicted CDS and peptides”. The second option will give the predicted amino acid, followed by the encoding nucleotide sequences.
    5. Browse button: to upload the input nucleotide sequence for gene prediction.
  5. Browse to upload the sequence using the “Browse…” button.
  6. Click “Run GENSCAN” to start the analysis (Figure 37.1).
Homepage of the online GENSCAN software with option bars for “Run GENSCAN” and “Clear Input” at the bottom left.

FIGURE 37.1 Homepage of the online GENSCAN software.

37.4 INTERPRETATION OF GENSCAN OUTPUT

  • The GENSCAN output appears in a new window on the same web page. The GENSCAN version, date and time of run are shown at the top.
  • This is followed by the size of input sequence, G/C percentage, which gives the predicted exons in a tabular form in the next section as:
  • Gn.ExTypeS.Begin …End.Len FrPh I/Ac Do/T CodRgP…. Tscr.. (Figure 37.2)
  • It also gives the results for the suboptimal exons with probability 1.
  • Finally, the predicted amino acid sequence(s) and the respective coding nucleotide sequence(s) are given.
Output page of the GENSCAN software depicting some of the predicted genes or exons (top) and some of the predicted protein sequences (bottom).

FIGURE 37.2 Output page of the GENSCAN software.

37.4.1 Some points to remember while using GENSCAN

  1. This tool cannot handle data larger than one million bases, so please limit the input sequence size to 1 MB.
  2. The user needs to mask the repeat sequences prior to submitting to GENSCAN.
  3. It is not to be used for prokaryotic and yeast sequences.
  4. It can predict internal exons more accurately than the terminal exons.

37.5 QUESTIONS

  1. 1. Discuss the output parameters obtained from GENSCAN.
  2. 2. Predict and annotate the genes of the taurine Y chromosome.
  3. 3. What are the key elements of the eukaryotic gene that are taken into account while predicting genes? What will be your strategy to predict eukaryotic genes from a given sequence, if no tools are available?
  4. 4. Download the sex chromosomes of mouse (Mus musculus) and predict the genes in both chromosomes. Which genes are in common in both chromosomes?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.116