Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The inner working of apriori

The goal of apriori is to compute the frequent itemsets and the association rules in an efficient way, as well as to compute support and confidence for these. Going into the details of these computations is beyond the scope of this chapter. In what follows, we briefly examine how itemset generation and rule generation are accomplished.

Generating itemsets with support-based pruning

The most straightforward way to compute frequent itemsets would be to consider all the possible itemsets and discard those with support lower than minimal support. This is particularly inefficient, as generating itemsets and then discarding them is a waste of computation power. The goal is, of course, to generate only the itemsets that are useful for the analysis: those with support higher than minimal support. Let's continue with our previous example. The following table presents the same data using a binary representation:

Transaction	Cherry Coke	Chicken wings	Chips	Chocolate cake	Lemon
1	1	0	1	0	1
2	1	1	0	0	1
3	1	1	1	0	1
4	0	1	1	0	1
5	1	0	1	1	1

With a minimal support higher than 0.2, we can intuitively see that any itemset containing chocolate cake would be a waste of resources, as its support could not be higher than that of chocolate cake (which is lower than minimal support). The {Cherry Coke, Chips, Lemon} itemset is frequent, considering a minimal support of 0.6 (three out of five). We can intuitively see that all itemsets that feature items in {Cherry Coke, Chips, Lemon} are necessarily also frequent: Support({Cherry Coke}) = 0.8, Support({Chips}) = 0.8; Support({Lemon}) = 1, Support({Cherry Coke, Chips}) = 0.6; Support({Cherry Coke, Lemon}) = 0.8; and Support({Chips, Lemon}) = 0.8.

Apriori uses such strategies to generate itemsets. In short, it discards supersets of infrequent itemsets without having to compute their support. Subsets of frequent itemsets, which are necessarily frequent as well, are included as frequent itemsets. These are called support-based pruning.

Generating rules by using confidence-based pruning

Apriori generates rules by computing the possible association rules that have one item as consequent. By merging such rules with high confidence, it builds rules with two items as consequents and so on. If it is known that both association rules {Lemon, Cherry Coke, Chicken Wings} => {Chips} and {Lemon, Chicken wings, Chips} => {Cherry Coke} have high confidence, we know that {Lemon, Chicken wings} => {Chips, Cherry Coke} will have high confidence as well. As an exercise, examine the preceding table to discover whether or not these rules have high confidence.

We now know more about how apriori works. Let's start analyzing some data in R!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for The inner working of apriori

Create new playlist

Sign In

Sign Up

The inner working of apriori

Generating itemsets with support-based pruning

Generating rules by using confidence-based pruning

Table of Contents for
The inner working of apriori