Apriori

We will use the Apriori algorithm as implemented in Weka. It iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence. We'll implement the algorithm using the following steps:

We'll import the required libraries using the following lines of code:

import java.io.BufferedReader; 
import java.io.FileReader; 
import weka.core.Instances; 
import weka.associations.Apriori;

First, we'll load the supermarket.arff dataset:

Instances data = new Instances(new BufferedReader(new FileReader("data/supermarket.arff")));

We'll initialize an Apriori instance and call the buildAssociations(Instances) function to start frequent pattern mining, as follows:

Apriori model = new Apriori(); 
model.buildAssociations(data);

We can output the discovered itemsets and rules, as shown in the following code:

System.out.println(model);

The output is as follows:

   Apriori
    =======
    
    Minimum support: 0.15 (694 instances)
    Minimum metric <confidence>: 0.9
    Number of cycles performed: 17
    
    Generated sets of large itemsets:
    Size of set of large itemsets L(1): 44
    Size of set of large itemsets L(2): 380
    Size of set of large itemsets L(3): 910
    Size of set of large itemsets L(4): 633
    Size of set of large itemsets L(5): 105
    Size of set of large itemsets L(6): 1
    
    Best rules found:
    
     1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723    <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35)
     2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696    <conf:(0.92)> lift:(1.27) lev:(0.03) [149] conv:(3.28)
     3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705    <conf:(0.92)> lift:(1.27) lev:(0.03) [150] conv:(3.27)
    ...

The algorithm outputs the ten best rules according to confidence. Let's look at the first rule and interpret the output, as follows:

biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723    <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35)

It says that when biscuits, frozen foods, and fruits are bought together and the total purchase price is high, it is also very likely that bread and cake are purchased as well. The {biscuits, frozen foods, fruit, total high} itemset appears in 788 transactions, while the {bread, cake} itemset appears in 723 transactions. The confidence of this rule is 0.92, meaning that the rule holds true in 92% of transactions where the {biscuits, frozen foods, fruit, total high} itemset is present.

The output also reports additional measures such as lift, leverage, and conviction, which estimate the accuracy against our initial assumptions; for example, the 3.35 conviction value indicates that the rule would be incorrect 3.35 times as often if the association was purely a random chance. Lift measures the number of times X and Y occur together than expected if they were statistically independent (lift=1). The 2.16 lift in the X -> Y rule means that the probability of X is 2.16 times greater than the probability of Y.

Table of Contents for Apriori

Create new playlist

Sign In

Sign Up

Table of Contents for
Apriori