10.5 Discussion

This chapter presented a general framework for the systematic extraction of dense patterns from simple or higher-order edge-weighted network data. It extends conventional relational set mining approaches [72,74,75] and clique-related network analysis [13,17,15]. The proposed reverse search algorithm allows for an effective, antimonotonicity-based pruning of the search space without missing any solutions; the complexity of the delay between two consecutive solutions is in the order of the input size. This property allows to apply it in cases where straightforward set enumeration algorithms are infeasible [4]. However, for large datasets or low density thresholds, the number of solutions can be prohibitive, making the computation slow in spite of the favorable runtime per solution.

There are several remedies for this problem. The first possibility is to maintain the enumerative search, but add further constraints based on additional criteria, prior knowledge, or external data [1], as described in Section 10.3.9. If relevant subsets are prespecified for some dimensions (for instance, windows of consecutive time intervals), reverse search with respect to the other dimensions can be performed for each of these subsets individually. On the other hand, one can use the reverse search strategy and additionally apply heuristic criteria or sampling techniques to control the number, overlap, and relevance of solutions; this allows to directly trade off the runtime and the completeness of the solution set, as exemplified in Section 10.3.9 with a simple branching heuristic [4]; similarly, heuristic pruning rules could be specified by appropriate thresholding of (relative) degree values. Even if it is not used for exhaustive exploration, the definition of the antimonotonic search space has a value by itself, as valid solutions are visited with short delay.

Finally, instead of applying the method to the whole dataset at once, it can be combined with different strategies of prepartitioning or preaggregation of the data [8,82]. Also, more effective exploitation of minimum size or connectivity constraints is conceivable. Furthermore, the reverse search strategy is compatible with distributed computation, and its efficiency can be enhanced by adapting data structures and pruning techniques to the specific task at hand.

Notes

1. Other default values may be specified as well.

2. However, non-negative input matrices allow for additional improvements of the search, see Sections 10.3.3 and 10.3.5.

3. However, they can be integrated in a similar way as node weights (see Section 10.3.8).

4. The density check comes first because it is computationally cheaper, see next section.

5. Note that without this modification, the algorithm would output solutions only after having reached leaf clusters of the search tree (i.e., after a sequence of recursive calls).

6. However, the parent is not necessarily the densest direct subcluster of a given child cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108