10.1 Introduction

One of the central topics in analyzing structured data is the detection of dense substructures, often called clusters, modules, or communities. Such sets of strongly interrelated entities provide interesting information in many different contexts, for instance, interaction of proteins, webpage linking, citation of articles, email communication between individuals, spatial proximity of objects in an image, 3D-contact of atoms in a macromolecule, co-occurrence of words in documents, or similarity of genomic sequences. In graph terminology, the entities are referred to as nodes, and interactions are represented by edges between the nodes. Different strengths of interactive relationships can be indicated by edge weights.

This chapter presents an enumerative approach to find node sets that satisfy an explicit interaction density criterion [1], conceptually generalizing traditional clique search [2]. Beyond that, the described framework can enumerate dense cluster patterns from asymmetric, bipartite graph structures and from higher-order associations that involve more than two entities at the same time, forming a hypergraph [3,4]. Moreover, the method allows to integrate additional user-defined constraints in order to systematically discover relevant substructures. The density of an entity set is generally defined as the total interaction weight between entities within the set divided by the maximum possible amount of interaction. We describe a reverse search approach to detect all set patterns satisfying a user-defined minimum density threshold. The main idea is to specify a parent–child relationship between set patterns that results in an antimonotonic search space, which allows for effective pruning. Mining approaches that directly control the density properties of patterns are promising in several application fields. In molecular systems biology for instance, one important problem is to infer protein complexes or groups of functionally related genes from experimental measurement data. By considering different density cutoffs, it is possible to trade off precision and recall levels for the discovered patterns. Furthermore, the enumerative method identifies overlapping patterns, and integration with gene activity profiles can reveal context-dependent variation of interaction patterns.

The chapter is organized as follows. The next section sketches related work on pattern detection in structured and relational data. Section 10.3 explains the dense cluster enumeration algorithm for the basic case where input data are represented as an undirected weighted interaction network. Section 10.4 addresses the generalization of the method to higher-order association data and hypergraphs. Section 10.5 discusses the presented approach and points out potential extensions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.36.194