Similar to TreeMiner, the HybridTreeMiner algorithm joins two k-trees
which have the common prefi x (k-1)-trees, to generate the candidate subtree.
For those trees which cannot be generated by joining, HybridTreeMiner
borrows the idea of FREQT, that extends the frequent subtrees to obtain the
larger candidates. There are several effective strategies proposed in [13, 14]
to address the issue of tree authomorphisms during the joining and extending
processes. Moreover, for different types of trees, the authors introduced
different approaches to improve the effi ciency of the algorithms. For
example, to deal with the free trees, the algorithm is extended by utilizing the
breadth fi rst tree encoding. By this way, it can take account for a small part of
all the rooted trees. Another strategy introduced by HybridTreeMiner is that
the occurrence lists of the subtrees are proposed and the authors explained
how they are joined for generating the candidates in Chi et al. [14].
6.3.2.4 Gaston
Nijssen et al. proposed another algorithm named Gaston [40]. Based on the
similar idea of TreeMiner and HybridTreeMiner, the Gaston algorithm applies
both the breadth fi rst search and depth fi rst search strategies. There are
several phases introduced for the whole mining process. Firstly, it extracts
the frequent undirected paths by traversing all the possible ways. To
facilitate the process, the authors introduced an effective representation for
trees which can be built in reasonable time for large data; then it deals with
these paths as the start point of a rooted tree, and joins or extends them with
rightmost path extension technique to generate the candidates and test.
6.3.3 Maximal and closed frequent subtrees
A main issue for all the previous work is that the resultant frequent subtrees
may be very large and it can grow exponentially when the size of the original
data increases. As a result, how to effi ciently obtain them is important.
Moreover, it is very diffi cult to clarify the whole result because of the huge
size of them. To tackle these problems, the maximal and closed frequent
subtrees have been introduced [50, 54, 12], which borrows the idea from the
literature of itemset and sequence mining. The defi nition of the maximal
frequent subtree is that none of a maximal frequent subtree’s super trees
are frequent. By this way, the discovered frequent patterns can be reduced
dramatically, which facilitates the mining process and the explanation of
the results. The basic idea of [50, 54] is that they fi rst fi nd all the frequent
subtrees, and then fi lter out those non-maximal patterns. This technique,
although simple to be implemented, is time consuming. To tackle the issue,
Chi et al. [12] proposed the CMTreeMiner algorithm, which extracts the
maximal patterns without fi rst fi nding all the frequent ones. Furthermore,
Frequent Pattern Mining 141