Association analysis is the process of discovering interesting relationships hidden in large datasets. This is an interesting relationship that can be discovered in the form of association rules or frequent items that occur together. Spark provides the FPGrowth algorithm for finding frequent itemsets.
The FPGrowth algorithm creates a compact data structure called an FPtree and then it can extract frequent items from this data structure. Let's see an example of how to use the FPGrowth algorithm:
Download the dataset into the datasets
folder from the following website:
Now invoke the following program:
$ sbt "set fork := true" "run-main chapter04.MarketBasketAnalysis" ... OUTPUT SKIPPED ... [info] [2pct. Milk,White Bread], 70 [info] [2pct. Milk,Eggs], 71 [info] [White Bread,Eggs], 75 [info] [Potato Chips,White Bread], 70
From the preceding output we can see, that this makes perfect sense. Anyone who buys milk will also likely buy either bread or eggs and in this case we have the data to back up our conclusions.
3.143.239.103