Association analysis

Association analysis is the process of discovering interesting relationships hidden in large datasets. This is an interesting relationship that can be discovered in the form of association rules or frequent items that occur together. Spark provides the FPGrowth algorithm for finding frequent itemsets.

Frequent pattern mining (FPGrowth)

The FPGrowth algorithm creates a compact data structure called an FPtree and then it can extract frequent items from this data structure. Let's see an example of how to use the FPGrowth algorithm:

Frequent pattern mining (FPGrowth)

Download the dataset into the datasets folder from the following website:

https://sites.google.com/a/nu.edu.pk/tariq-mahmood/teaching-1/fall-12---dm/marketbasket.csv?attredirects=0&d=1

Now invoke the following program:

$ sbt "set fork := true" "run-main chapter04.MarketBasketAnalysis"
... OUTPUT SKIPPED ...
[info] [2pct. Milk,White Bread], 70
[info] [2pct. Milk,Eggs], 71
[info] [White Bread,Eggs], 75
[info] [Potato Chips,White Bread], 70

From the preceding output we can see, that this makes perfect sense. Anyone who buys milk will also likely buy either bread or eggs and in this case we have the data to back up our conclusions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.239.103