5.4 Conclusion and Summary

Our study shows that the choice of the discretization method and MI estimator has a crucial influence on the inference performance of C3NET. In detail, the equal width and global equal width showed the best performance in combination with the Miller–Madow estimator. However, the major influence on the C3NET inference performance was observed for the discretization methods, where equal width and global equal width discretization markedly outperforms the equal frequency discretization.

In the study conducted by Olsen et al. [16], the influence of discretization, the mutual information estimator, sample size, and network size was studied for the ARACNE, CLR, and MRNET GRN inference algorithms. In contrast to our results, the equal frequency discretization was observed to outperform the equal width discretization for the used inference algorithms. In addition, the discrete estimators did not show a large difference as seen in our study, for example, for Miller–Madow.

The results suggest that the influence of the MI estimator on the global inference performance is highly dependent on the inference algorithm used. It is, therefore, a prerequisite to test GRN inference algorithms individually for different discretization and mutual information estimators.

Global error measures quantify the average inference performance for all edges in a network. Local measures allow to zoom-in the inference performance of individual parts of the network, down to individual edges. For C3NET, edges of leaf nodes and edges of linearly connected nodes of a network are inferred with higher performance [12]. Edges from nodes with a high degree are likely underrepresented as they are more difficult to infer.

We studied the influence of the MI estimators on the performance for different edge classes that were classified by local network-based measures. Two edge classes were defined for edges of leaf and linearly connected nodes, and highly connected nodes in the network. The inference ability of C3NET for an edge was quantified by the true positive rate measured from an ensemble of networks inferred from Bootstrap datasets. From the set of datasets for a network the distribution of median true positive rate is obtained for Class I and Class II edges. We compared the resulting distributions of true positive rates between edges of Class I and Class II. As expected, we observed high true positive rates for Class I edges, while Class II edges show low true positive rates. As the true positive rate for Class I edges was very high the differences among the MI estimators were not so apparent as for Class II edges. For Class II edges, the Miller–Madow estimator resulted in the best inference performance for C3NET, as for the global error measure.

In this chapter, we presented a simulation study to analyze the impact of MI estimators on the inference performance of C3NET. The inference performance was studied using global and local network-based measures for simulated gene expression data from Erdös–Rènyi networks with different edge densities and varying sample sizes. Among the tested combinations of discretization methods and MI estimators we recommend the use of the Miller–Madow estimator with equal width or global equal width discretization for C3NET network inference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.157.6