120 Applied Data Mining
6.1.2 Basic Algorithms for Association Rule Mining
6.1.2.1 Apriori
The fi rst algorithm was introduced by Agrawal et al. [1] to address the
association rule mining issue. The same authors introduced another
algorithm named Apriori in their later paper [4] by introducing the
monotonicity property of the association rules to improve the performance.
Mannila et al. [39] presented an independent work with a similar idea.
Apriori applies a two-stage approach to discover frequent itemsets and
confi dent association rules.
• Frequent Itemset Discovery. To fi nd all frequent itemsets, Apriori
introduces a candidate generation and test strategy. The basic idea is
that it fi rst generates the candidate k-itemsets (i.e., k is 1 at the beginning
and is incrementd by 1 in the next cycle), then these candidates will
be evaluated whether frequent or not.
Specifi cally, the algorithm fi rst scans the dataset and the frequent
1-itemsets are found. To discover those frequent 2-itemsets, Apriori
generates candidate 2-itemsets by joining 1-itemsets. These candidates
are evaluated by scanning the original dataset again. In a similar way, all
frequent (k+1)-itemsets can be found based on already known frequent
k-itemsets.
To improve the performance by avoiding to generate too many yet
unnecessary candidates, Apriori introduced a monotonicity property that
a (k+1)-itemset becomes a candidate only if all its k-subset are frequent.
As demonstrated by the authors [4] and many later works, this simple yet
effi cient strategy largely reduces the candidates to be evaluated.
The frequent itemset mining of the Apriori algorithm is presented in
Algorithm 1. The algorithm is executed in a breadth-fi rst search manner.
To generate the candidate itemsets with length k+1, two k-itemsets with the
same (k-1)-prefi x are joined together (lines 12–13). The joined itemset can
be inserted into C
k+1
only if all its k-subsets are frequent (line 14).
To test the candidate k-itemsets (i.e., count their supports), the database
is scanned sequentially and all the candidate itemsets are tested whether
they are included in the transaction scanned. By this way, the corresponding
support is accumulated (lines 5–9). Finally, frequent itemsets are collected
(line 10).
• Association Rule Mining. After discovering the frequent itemsets, we
can fi nd the frequent and confi dent association rules straightforward.
The approach is similar to the frequent itemset mining algorithm.
Because the cost of fi nding frequent itemsets is high and accounts for