The inner working of apriori

The goal of apriori is to compute the frequent itemsets and the association rules in an efficient way, as well as to compute support and confidence for these. Going into the details of these computations is beyond the scope of this chapter. In what follows, we briefly examine how itemset generation and rule generation are accomplished.

Generating itemsets with support-based pruning

The most straightforward way to compute frequent itemsets would be to consider all the possible itemsets and discard those with support lower than minimal support. This is particularly inefficient, as generating itemsets and then discarding them is a waste of computation power. The goal is, of course, to generate only the itemsets that are useful for the analysis: those with support higher than minimal support. Let's continue with our previous example. The following table presents the same data using a binary representation:

Transaction

Cherry Coke

Chicken wings

Chips

Chocolate cake

Lemon

1

1

0

1

0

1

2

1

1

0

0

1

3

1

1

1

0

1

4

0

1

1

0

1

5

1

0

1

1

1

With a minimal support higher than 0.2, we can intuitively see that any itemset containing chocolate cake would be a waste of resources, as its support could not be higher than that of chocolate cake (which is lower than minimal support). The {Cherry Coke, Chips, Lemon} itemset is frequent, considering a minimal support of 0.6 (three out of five). We can intuitively see that all itemsets that feature items in {Cherry Coke, Chips, Lemon} are necessarily also frequent: Support({Cherry Coke}) = 0.8, Support({Chips}) = 0.8; Support({Lemon}) = 1, Support({Cherry Coke, Chips}) = 0.6; Support({Cherry Coke, Lemon}) = 0.8; and Support({Chips, Lemon}) = 0.8.

Apriori uses such strategies to generate itemsets. In short, it discards supersets of infrequent itemsets without having to compute their support. Subsets of frequent itemsets, which are necessarily frequent as well, are included as frequent itemsets. These are called support-based pruning.

Generating rules by using confidence-based pruning

Apriori generates rules by computing the possible association rules that have one item as consequent. By merging such rules with high confidence, it builds rules with two items as consequents and so on. If it is known that both association rules {Lemon, Cherry Coke, Chicken Wings} => {Chips} and {Lemon, Chicken wings, Chips} => {Cherry Coke} have high confidence, we know that {Lemon, Chicken wings} => {Chips, Cherry Coke} will have high confidence as well. As an exercise, examine the preceding table to discover whether or not these rules have high confidence.

We now know more about how apriori works. Let's start analyzing some data in R!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.66.94