Unsupervised machine learning

Unsupervised machine learning does not use annotated data; that is, the dataset does to contain anticipated results. While there are several unsupervised learning algorithms, we will demonstrate the use of association rule learning to illustrate this learning approach.

Association rule learning

Association rule learning is a technique that identifies relationships between data items. It is part of what is called market basket analysis. When a shopper makes purchases, these purchases are likely to consist of more than one item, and when it does, there are certain items that tend to be bought together. Association rule learning is one approach for identifying these related items. When an association is found, a rule can be formulated for it.

For example, if a customer buys diapers and lotion, they are also likely to buy baby wipes. An analysis can find these associations and a rule stating the observation can be formed. The rule would be expressed as {diapers, lotion} => {wipes}. Being able to identify these purchasing patterns allows a store to offer special coupons, arrange their products to be easier to get, or effect any number of other market-related activities.

One of the problems with this technique is that there are a large number of possible associations. One efficient method that is commonly used is the apriori algorithm. This algorithm works on a collection of transactions defined by a set of items. These items can be thought of as purchases and a transaction as a set of items bought together. The collection is often referred to as a database.

Consider the following set of transactions where a 1 indicates that the item was purchased as part of a transaction and 0 means that it was not purchased:

Transaction ID

Diapers

Lotion

Wipes

Formula

1

1

1

1

0

2

1

1

1

1

3

0

1

1

0

4

1

0

0

0

5

0

1

1

1

There are several analysis terms used with the apriori model:

  • Support: This is the proportion of the items in a database that contain a subset of items. In the previous database, the item {diapers, lotion} occurs 2/5 times or 20%.
  • Confidence: This is a measure of the frequency of the rule being true. It is calculated as conf(X- > Y) = sup(X ∪ Y)/sup(X).
  • Lift: This measures the degree to which the items are dependent upon each other. It is defined as lift(X -> Y) = sup(X ∪ Y) / (sup(X) * sup(Y)).
  • Leverage: Leverage is a measure of the number of transactions that are covered by both X and Y if X and Y are independent of each other. A value above 0 is a good indicator. It is calculated as lev(X->Y) = sup(X ,Y) - sup(X) * sup(Y).
  • Conviction: A measure of how often the rule makes an incorrect decision. It is defined as conv(X -> Y) = 1 - sup(Y) / (1 - conf(X -> Y)).

These definitions and sample values can be found at https://en.wikipedia.org/wiki/Association_rule_learning.

Using association rule learning to find buying relationships

We will be using the Apriori Weka class to demonstrate Java support for the algorithm using two datasets. The first is the data discussed previously and the second deals with what a person may take on a hike.

The following is the data file, babies.arff, for baby information:

@relation TEST_ITEM_TRANS
@attribute Diapers {1, 0}
@attribute Lotion {1, 0}
@attribute Wipes {1, 0}
@attribute Formula {1, 0}
@data
1,1,1,0
1,1,1,1
0,1,1,0
1,0,0,0
0,1,1,1

We start by reading in the file using a BufferedReader instance. This object is used as the argument of the Instances class, which will hold the data:

try { 
    BufferedReader br; 
    br = new BufferedReader(new FileReader("babies.arff")); 
   Instances data = new Instances(br); 
    br.close(); 
    ... 
} catch (Exception ex) { 
    // Handle exceptions 
} 

Next, an Apriori instance is created. We set the number of rules to be generated and a minimum confidence for the rules:

Apriori apriori = new Apriori(); 
apriori.setNumRules(100); 
apriori.setMinMetric(0.5); 

The buildAssociations method generates the associations using the Instances variable. The associations are then displayed:

apriori.buildAssociations(data); 
System.out.println(apriori); 

There will be 100 rules displayed. The following is the abbreviated output. Each rule is followed by various measures of the rule:

Note

Note that rule 8 and 100 reflect the previous examples.

Apriori
=======
Minimum support: 0.3 (1 instances)
Minimum metric <confidence>: 0.5
Number of cycles performed: 14
Generated sets of large itemsets:
Size of set of large itemsets L(1): 8
Size of set of large itemsets L(2): 18
Size of set of large itemsets L(3): 16
Size of set of large itemsets L(4): 5
Best rules found:
1. Wipes=1 4 ==> Lotion=1 4 <conf:(1)> lift:(1.25) lev:(0.16) [0] conv:(0.8)
2. Lotion=1 4 ==> Wipes=1 4 <conf:(1)> lift:(1.25) lev:(0.16) [0] conv:(0.8)
3. Diapers=0 2 ==> Lotion=1 2 <conf:(1)> lift:(1.25) lev:(0.08) [0] conv:(0.4)
4. Diapers=0 2 ==> Wipes=1 2 <conf:(1)> lift:(1.25) lev:(0.08) [0] conv:(0.4)
5. Formula=1 2 ==> Lotion=1 2 <conf:(1)> lift:(1.25) lev:(0.08) [0] conv:(0.4)
6. Formula=1 2 ==> Wipes=1 2 <conf:(1)> lift:(1.25) lev:(0.08) [0] conv:(0.4)
7. Diapers=1 Wipes=1 2 ==> Lotion=1 2 <conf:(1)> lift:(1.25) lev:(0.08) [0] conv:(0.4)
8. Diapers=1 Lotion=1 2 ==> Wipes=1 2 <conf:(1)> lift:(1.25) lev:(0.08) [0] conv:(0.4)
...
62. Diapers=0 Lotion=1 Formula=1 1 ==> Wipes=1 1 <conf:(1)> lift:(1.25) lev:(0.04) [0] conv:(0.2) 
...
99. Lotion=1 Formula=1 2 ==> Diapers=1 1 <conf:(0.5)> lift:(0.83) lev:(-0.04) [0] conv:(0.4)
100. Diapers=1 Lotion=1 2 ==> Formula=1 1 <conf:(0.5)> lift:(1.25) lev:(0.04) [0] conv:(0.6)

This provides us with a list of relationships, which we can use to identify patterns in activities such as purchasing behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.15.161