The genomic machinery harbors several signals, elements and conserved regions, namely, promoter signals, transcription start and termination signals (start codon “ATG” and termination codons: Ochre: “TAA”, Amber: “TAG”, Opal: “TGA”), codons, exons, intervening introns (delimited by exon–intron boundaries), etc. Use of suitable software enables the researchers to identify and annotate the various regions of the genome. In this chapter, we will use the GeneMark program to learn gene finding and genome annotation in prokaryotes.
36.2 OBJECTIVE
To annotate the partial genome of a prokaryotic organism, using the GeneMark.hmm (Lukashin and Borodovsky, 1998) online tool.
36.3 PROCEDURE
Download the nucleotide sequence from Nucleotide database (NCBI) and save it in Notepad in FASTA format:
Now the sequence is to be saved (in FASTA format) in a .txt file (Notepad), using the “send to” option available on the right‐hand side of the top of the page.
Click “GeneMark hmm” on the left‐hand side of the shortcut menu (arrow) and select “prokaryotes”.
Browse the sequence file by clicking on the “Browse…” button, or paste the sequence into the box provided.
Select the species from the drop‐down option. If the exact species is not there, select a similar type of species.
Check the boxes located at the bottom, as per the output requirements:
to obtain the in silico translations of the predicted genes;
to get the nucleotide sequences of the predicted genes;
to produce on‐screen PDF graphics;
to generate PostScript graphics (via email).
Click “Start GeneMark.hmm” to start gene searching under the “Action” tab (See Figure 36.2).
36.4 INTERPRETATION OF GENEMARK OUTPUT
The program gives the results in tabular format: Gene (as serial number), Strand (positive or negative strand), Left End and Right End (start and end nucleotide number of the gene), Gene length, and Class.
This is followed by the predicted sequence of the translated amino acids and the nucleotide sequence of the gene (if the options “Translate predicted genes into proteins” and “Sequences of predicted genes” have been checked) (see Figure 36.3).
36.5 QUESTIONS
1. Download the given sequence form NCBI, predict the possible genes and annotate them:
BX248333.1
NZ_KK354537.1
2. Compare the genome annotation results of RAST and GeneMark using the sequence from NCBI BX248333.1 (accession number).
3. What are the salient points to be considered while annotating a given DNA sequence of a prokaryote?
4. How can we identify the novel genes which are missed by a genome annotation/prediction tool?
5. Please annotate the cloning vector pRB223, using suitable tools.