5.1 Introduction

The inference of gene regulatory networks aims to unveil the causal structure of the relations among genes in a cellular system from gene expression data [1-4]. Gene regulatory networks (GRNs) allow to organize genes according to their gene expression dependency structure and aim to complement the understanding of the molecular structures and processes in complex organismal cellular systems. The vast amount of gene regulatory network inference methods that are being developed are gaining more and more popularity due to the astonishing increase of high-throughput dataset generation. The challenge of the future is the development of novel statistical methods to benefit from present and new emerging mass data [5,6], for example, from microarray, Chip–Chip [7], Chip–seq, proteomics mass spectrometry, protein arrays, and RNA–seq. Due to the large amount of available samples and cost efficiency, DNA microarrays are still the state-of-the-art data source for gene regulatory network inference. For example, one of the largest data repository of such high-throughput gene expression data is the GEO database [8] that provides a large range of observational [9,10] and experimental gene expression data. Such large-scale datasets for different organisms, perturbation and disease conditions, enable system-wide studies of species, and phenotype-specific gene regulatory networks.

The edges in gene regulatory networks represent physical interactions between genes, intermediates, and their products. These can be genes regulated by the same transcription factor (coexpression) and physical interactions from protein complexes, metabolic and signaling pathways. For expression data, inferred interactions have been observed to preferably indicate transcription regulation in Escherichia coli [11] but can also correspond to other types of molecular interactions. Many current approaches for gene regulatory network inference result in networks with a high edge density. This leads to difficulties in elucidating the biological relevance and importance of the individual edges. Instead, a network inferred with C3NET is very sparse and represents the core of a gene regulatory network considering only the gene pairs with strongest expression dependencies [12]. The inference of sparse gene regulatory networks is therefore a promising approach to reduce the overall complexity by considering only causal dependencies with the highest signal [13,14].

There are many gene regulatory network inference methods that are based on estimates of mutual information values. Methods for network inference based on mutual information are called relevance-based gene regulatory network inference methods. The first method based on mutual information for GRN inference was introduced by Ref. [15]. Mutual information is a measure of the nonlinear correlation between two random variables (i.e., two genes), for example, estimated from the (individual) and joint entropy of two random variables.

A variety of different mutual information estimators were developed in order to obtain accurate estimates for different assumptions regarding the characteristics of the underlying data. In Ref. [16] mutual information-based gene regulatory network inference algorithms were evaluated for different mutual information estimators, demonstrating that the choice of the MI estimator influences the inference performance for a given GRN approach. We address the question as to what extent the inference performance of the C3NET algorithm is affected so as to determine which estimator is the most beneficial in terms of its inference performance. In addition, we also employ local network-based measures to study the inference performance for edges connected to genes with a high degree and edges from linearly connected genes.

In the first part of this chapter, we describe the C3NET algorithm [12] for the inference of gene regulatory networks. In the second part, we present four common mutual information estimators, an ensemble approach for global inference performance measure and local measures, investigating the influence of the estimators on the networks inferred by C3NET. In the third part, we present numerical results for in silico gene expression datasets generated for three Erdös–Rényi networks for various sample sizes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.26.108