Association rules (Intermediate)

Association analysis is used for finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and so on. It's often used to do market basket analysis, as done by big supermarket chains to decide what products are often bought together; these are then placed close to each other in the store to increase the chance of people picking them up on impulse. This recipe will follow you through the initialization of association rules analysis.

Getting ready

This recipe will use a different sample dataset referred to as the bank data, available in a comma-separated format on the Weka website. To illustrate the basic concepts of association rule mining, you can find pre-processed data in the source code bundle located in /dataset/bank-data.arff, which is already converted to the ARFF format and filtered.

How to do it...

We will use the Apriori algorithm as implemented in Weka. It iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence:

import java.io.BufferedReader;
import java.io.FileReader;

import weka.core.Instances;
import weka.associations.Apriori;

public class AssociationRules{

  public static void main(String args[]) throws Exception{
    
    //load data
    Instances data = new Instances(new BufferedReader(new FileReader("dataset/bank-data.arff")));
    
    //build model
    Apriori model = new Apriori();
    model.buildAssociations(data); 
    System.out.println(model);
    
  }
}

This should output the following:

Apriori
=======

Minimum support: 0.1 (60 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 18

Generated sets of large itemsets:

Size of set of large itemsets L(1): 28

Size of set of large itemsets L(2): 232

Size of set of large itemsets L(3): 524

Size of set of large itemsets L(4): 277

Size of set of large itemsets L(5): 33

Best rules found:

 1. income=43759_max 80 ==> save_act=YES 80    <conf:(1)> lift:(1.45) lev:(0.04) [24] conv:(24.8)
 2. age=52_max income=43759_max 76 ==> save_act=YES 76    <conf:(1)> lift:(1.45) lev:(0.04) [23] conv:(23.56)
 3. income=43759_max current_act=YES 63 ==> save_act=YES 63    <conf:(1)> lift:(1.45) lev:(0.03) [19] conv:(19.53)
 4. age=52_max income=43759_max current_act=YES 61 ==> save_act=YES 61    <conf:(1)> lift:(1.45) lev:(0.03) [18] conv:(18.91)
 5. children=0 save_act=YES mortgage=NO pep=NO 74 ==> married=YES 73    <conf:(0.99)> lift:(1.49) lev:(0.04) [24] conv:(12.58)
 6. sex=FEMALE children=0 mortgage=NO pep=NO 64 ==> married=YES 63    <conf:(0.98)> lift:(1.49) lev:(0.03) [20] conv:(10.88)
 7. children=0 current_act=YES mortgage=NO pep=NO 82 ==> married=YES 80    <conf:(0.98)> lift:(1.48) lev:(0.04) [25] conv:(9.29)
 8. children=0 mortgage=NO pep=NO 107 ==> married=YES 104    <conf:(0.97)> lift:(1.47) lev:(0.06) [33] conv:(9.1)
 9. income=43759_max current_act=YES 63 ==> age=52_max 61    <conf:(0.97)> lift:(3.04) lev:(0.07) [40] conv:(14.32)
10. income=43759_max save_act=YES current_act=YES 63 ==> age=52_max 61    <conf:(0.97)> lift:(3.04) lev:(0.07) [40] conv:(14.32)

The algorithm outputs a set of rules describing interesting patterns.

How it works...

Import the weka.associations.Apriori class implementing the Apriori algorithm:

import weka.associations.Apriori;

public class AssociationRules{

  public static void main(String args[]) throws Exception{

Load the bank dataset:

    Instances data = new Instances(new BufferedReader(new FileReader("dataset/bank-data.arff")));

Initialize the Apriori algorithm:

    Apriori model = new Apriori();

Call the buildAssociations(Instances) method to trigger the association rule discovery procedure:

    model.buildAssociations(data); 

Finally, output the discovered rules:

    System.out.println(model);

The output is interpreted as follows. Consider the sixth rule:

6. sex=FEMALE children=0 mortgage=NO pep=NO 64 ==> married=YES 63    <conf:(0.98)> lift:(1.49) lev:(0.03) [20] conv:(10.88)

The first part corresponds to the discovered rule/pattern, which is based on 64 examples and supported by 63 of them:

sex=FEMALE children=0 mortgage=NO pep=NO 64 ==> married=YES 63    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.186.109