S Jain1, S Panwar2 and A Kumar3
1 Department of Applied Sciences & Humanities, Jai Parkash MukandLal Innovative Engineering and Technology Institute, Yamuna Nagar, Haryana, India
2 Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Uttar Pradesh, India
3 Department of Nutrition Biology, Central University of Haryana, Haryana, India
Transcription factors are crucial for sequence‐specific control of transcriptional regulation. Classically, the computational prediction of transcription factor binding sites (TFBS) depends on position weight matrices (PWMs) (Wingender et al., 2001), which give weights to each nucleotide at each position. These models strongly suggest that each nucleotide participates independently in the corresponding DNA–protein interaction and does not account for flexible length motifs.
To predict the transcription binding site by using the TRANSFAC and MATCH tools
TRANSFAC is a database of TRANScription regulatory FACtors, and is maintained at GBF Braunschweig (Wingender et al., 2000). It combines the data regarding transcription factors, their DNA binding sites, sources of the factors and systematic classification of transcription factors. All the experimental results are accessible mainly through the FACTORS and the SITES table (Frech et al., 1997).
The data regarding binding proteins and the DNA sequences that are recognized by these proteins are maintained by the FACTORS and the SITES table, respectively. Furthermore, many transcription factors can be classified according to the respective DNA binding domains and/or their dimerization domains; therefore, the CLASS table has been introduced to TRANSFAC. Tiny TRP, a browsing tool for TRANSFAC, is the only solution that requires the linked databases in their original format. These links, between TRANSFAC and other databases such as PIR, EMBL, PROSITE, and so on, are crucial for the use of TRANSFAC.
Enrique Blanco has discussed the procedure in the “Practical” online tutorial (http://genome.crg.es/courses/Bioinformatics2003_promoters/).
An online subscription provides access to the TRANSFAC web interface. However, a download subscription provides access to flat files containing data for factors, matrices, binding sites, genes, ChIP fragments and other supporting information, as well as command line access to the MATCH tool.
The MATCH tool is used for searching binding sites for transcription factors in any sequence, using the mononucleotide weight matrix library from TRANSFAC.
Open the MATCH server to analyze promoter regions with TRANSFAC matrices: http://www.gene‐regulation.com/cgi‐bin/pub/programs/match/bin/match.cgi.
Enter a name for the search, since MATCH will store the result under that name. It will use the default as the result name.
There are three options for selecting a sequence for a search:
Select a group of matrices or a profile to run MATCH vertebrates, insects, plants, fungi, bacteria, and nematodes. The term “profile” refers to a set of weight matrices obtained from the TRANSFAC library.
The result page tabulates all matches found in the input sequence. The output of the program is limited to 500 000 matches per sequence. The outcomes are represented in Figure 33.4 with the following columns:
The last three lines of the result page give the total length of all the searched sequences, along with a total number of sites that have been found, and the frequency of sites per nucleotide.
Hint: TRANSFAC and MATCH are used to predict the TFBS. Consult section 33.3 and 33.4 for detail procedure.
Hint: Refer TRANSFAC and follow instructions given in section 33.3.1.
Hint: See IV point of section 33.3.1.
Hint: Follow the procedure explained in section 33.4.1.
Hint: Consult section 33.4.1.
18.222.3.153