GVPPSR Kumar, AP Sahoo and A Kumar
Animal Biotechnology Division, IVRI, UP, India
Cuffdiff predicts Differentially Expressed Genes (DEGs) and gives the gene symbols in the output. However, EBSeq, DESeq2, and edgeR give the output of DEGs in Ensembl IDs. These Ensemble IDs are initially converted into gene symbols using g:Convert in g:Profiler. After conversion, it is always better to identify commonly differentially expressed genes across all the packages and further proceed with the analysis. The commonly predicted genes are identified by using the Venny package.
A total of 4246 commonly differentially expressed genes have been identified by all the packages in our analysis.
Functional annotation is used to determine the gene ontology terms enriched in common differentially expressed genes. Gene ontology (GO) (Ashburner et al., 2000) is an in silico approach to amalgamate the methods of presenting the genes and gene product attributes over divergent species. Gene products are categorized into three categories (biological processes, cellular components and molecular functions) in a species‐independent manner in the process of assigning the annotations. There are several databases for performing the functional annotation: DAVID; AmiGO2; g:Profiler; PROSITE; PRINTS; Pfam; ProDom; SMART; TIGRFAMs; SUPERFAMILY; PIR superfamily; Gene3D; PANTHER; BLAST2GO; and HAMAP. Here we will be discussing g:Profiler, DAVID, and clueGO.
The gene lists resulting from analysis of high‐throughput genomic data can be manipulated and characterized by g:Profiler. This is a simple, user‐friendly web interface to derive and visualize GO functional pathways from enrichments of the transcription factor binding site up to individual gene levels (Reimand et al., 2007).
Open http://biit.cs.ut.ee/gprofiler/, paste the gene list, and select Bos taurus as the organism (species of interest) and the output type as Excel spreadsheet (Figures 43.4 and 43.5)
Download the Excel file to check for the annotations enriched in the differentially expressed genes (see Figure 43.6):
The output shows the significance of terms and the genes associated with the query (Q) in the term (T). The first term – response to abiotic stimulus (Biological process – BP (t type)) has a term ID of GO:0009628, with a p‐value of 1.79E‐07. The term has 625 genes associated with it, of which only 210 are enriched in the gene list, out of a total of 4133 genes considered.
The most common way of representing the functional terms is by choosing the top ten terms (by sorting on the basis of p‐value) in each category (Biological processes – BP; Molecular process – MP and cellular component – CC), and representing the term on the y‐axis and the significance (–log10P) on the x‐axis, as shown below for the biological processes. The same can be done for all the categories (Figure 43.7).
Interpretation of the data is completely the researcher’s purview.
DAVID is an integrated biological knowledge base and analytic tool, aimed at systematically extracting biological meaning from large gene/protein lists:
Open https://david.ncifcrf.gov and upload a multi‐list file if you have > 3000 genes to be annotated. The multi‐list file should be a list1 and list2, separated by a tab (shown in the figure below). Upload this list into DAVID, select the official gene symbol from the drop‐down menu (as an identifier), check the radio button against the gene list and submit (see Figure 43.8).
Select an appropriate background (here it is Bos taurus) against which you wish to test your gene list. Create a combined list by clicking “combine” after selecting both the lists, and select the combined list to get the functional annotations (see Figure 43.8).
Click on the functional annotation tool in the window to get the annotation summary results (see Figure 43.8).
The + button can be clicked in the window to get the results. To get the gene ontology terms, click the + button by the side of the gene ontology and then proceed to any particular category – BP, MP or CC. Clicking on the chart option opens up a window with all the specific terms. Here we click on BP to get all the gene ontology terms enriched for biological processes in our differentially expressed genes. The details containing the genes associated with each gene can be downloaded and opened in Excel for further use. The same can be done to visualize pathways enriched in the DEGs (see Figure 43.9).
ClueGO is a Cytoscape plug‐in that helps in functional annotation and interpretation of large lists of genes. It integrates KEGG/BioCarta pathways with GO terms to create a functionally organized GO/pathway term network.
Open the ClueGO app in Cytoscape, paste genes in the window (see Figure 43.10).
Select a gene ontology or pathway and start. Here we selected immune system processes, as shown in Figure 43.10.
Represent a network with GO term as node label, percentage associated genes as node color, and P‐Value Corrected with Bonferroni step‐down as node size (these parameters are selected as per the requirements of the researcher) (see Figures 43.11, 43.12 and 43.13).
The attributes of the network can be exported in a table format (see Figure 43.14).
18.224.52.212