Apriori algorithm

Apriori is a classical algorithm that is used to mine frequent itemsets to derive various association rules. It will help set up a retail store in a much better way, which will aid revenue generation.

The anti-monotonicity of the support measure is one of the prime concepts around which Apriori revolves. It assumes the following:

  • All subsets of a frequent itemset must be frequent
  • Similarly, for any infrequent itemset, all its supersets must be infrequent too

Let's look at an example and explain it:

Transaction ID

Milk

Butter

Cereal

Bread

Book

t1

1

1

1

0

0

t2

0

1

1

1

0

t3

0

0

0

1

1

t4

1

1

0

1

0

t5

1

1

1

0

1

t6

1

1

1

1

1

 

We have got the transaction ID and items such as milk, butter, cereal, bread, and book. 1 denotes that item is part of the transaction and 0 means that it is not.

  • We came up with a frequency table for all the items along, with support (division by 6):

Items

Number of transactions

Support

Milk

4

67%

Butter

5

83%

Cereal

4

67%

Bread

4

67%

Book

3

50%

  • We will put a threshold of support at 60%, which will filter out the items by frequency as these are the ones that can be addressed as frequent itemsets in this scenario:

Items

Number of transactions

Milk

4

Butter

5

Cereal

4

Bread

4

  • Similarly, we form the number of combinations (two at a time, three at a time, and four at a time) with these items and find out frequencies:

Items

Number of transactions

Milk, Butter

4

Milk, Cereal

3

Milk, Bread

2

Butter, Bread

3

Butter, Cereal

4

Cereal, Bread

2

 

Now, again, we have to find out the support for the preceding examples and filter them by threshold, which is support at 60%

Similarly, the combinations have to be formed with three items at a time (for example, Milk, Butter, and Bread) and support needs to be calculated for them. And, finally, we will filter them out by threshold. The same process needs to be done by doing four items at a time. The step that we have done till now is called frequent itemset generation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.84.157