tBLASTn is another type of translated BLAST algorithm, in which an amino acid sequence is used as a query to compare with the translated nucleotide (coding sequence) database. The amino acid sequence is compared at the protein level with each subject nucleotide sequence translated in all six reading frames. Thus, tBLASTn is very useful for searching protein homolog(s) in unannotated nucleotide data such as expressed sequence tags (maintained in BLAST database “est”) and draft genome records (located in the BLAST database “htgs”), which remain unannotated in the respective databases.
15.2 OBJECTIVE
To search for the homologous protein sequences of a pair of given protein sequences (NP_001028007, NP_001028008).
15.3 PROCEDURE
The basic steps of tBLASTn are the same as for BLASTx:
The main page of tBLASTn will be displayed (Figure 15.1).
15.3.2 Enter query sequences
Enter accession number(s) or FASTA sequence(s): Paste one or more protein query sequence(s) in FASTA format, or the respective NCBI accession number(s) (separated by Enter or Return key) for protein in the specified sequence box. Alternatively, a text file containing the amino acid query sequences (in FASTA format) could be uploaded by clicking the “Choose File” button.
Give a Job Title to identify the tBLASTn results from saved searches.
Checking “Align two or more sequences”: If this check box is checked, the page will be refreshed to provide the user with another sequence box, where the subject nucleotide sequence(s) is/are pasted.
Provide Query Sub‐range (optional): To specify a range of the input sequence that is to be searched against the database. This is especially useful when the GenBank accession number is used instead of the whole sequence itself.
15.3.3 Choose search set
Database: Choose any one of the nucleotide databases against which the search is to be made. The list of databases is almost the same as that for BLASTn, except for two options: “Human Genomics plus Transcript” and “Mouse Genomics plus Transcript” are absent.
Organism (Optional): Specify the organism (by common name or binomial name or taxonomical ID), if required. You can also check the small check box adjacent to the entry box to exclude any one or more (click on the “+” sign to add more organisms to be excluded) organisms from your search results.
Exclude Models (XM/XP) and/or Uncultured/environmental sample sequences (optional): Check one or both of the check boxes to exclude one or both of the options. Models (XM/XP) stands for the “model reference sequences”, determined and annotated from the Genome Annotation Project of NCBI and, thus, could be incomplete.
Entrez Query (optional): As with BLASTn, this is used to restrict the search to the specified Entrez query. It allows the Boolean operators, AND, OR, NOT, to define the database to be searched.
BLAST: Click on the button to initiate the tBLASTn search. Click the check box to open the search result in a new window.
15.4 ALGORITHM PARAMETERS
These are the same as those for BLASTx:
General parameters
Scoring parameters
Filters and masking
15.5 INTERPRETATION OF tBLASTn RESULTS
The output of tBLASTn is similar to that of BLASTp or BLASTx.
The color key‐based alignment depiction and the table indicating the tBLASTn output for various homologous sequences are also the same as that for BLASTx (Figure 15.2).
Individual pairwise alignment is also the same as that for BLASTp. However, the open reading frame out of all the possible six reading frames is indicated by “Frame”.
Variants of a protein can also be identified from the tBLASTn results.
15.6 QUESTIONS
1. The given amino acid sequence is to be checked for possible transcript variants (transcripts of the same gene with varying length and encoded protein sequences) in non‐humped cattle:
2. Discover the protein homologs in the equine genome for the following genes, using taurine amino acid sequences as the query sequence: TSPY (Testis‐specific protein, Y‐encoded), Cathelicidin, TLR4.
3. Discuss the applications of tBLASTn.
4. Explain the result of tBLASTn given in Figure 15.2, systematically.
5. Assume that the tBLASTn tool is not working for some days (or is not available). How will you proceed to analyze a given novel amino acid sequence to annotate its encoding gene‐specific features?