Let's consider the transaction data shown in the following table. Let's first represent it as a sparse matrix:
ID | Bat | Wickets | Pads | Helmet | Ball |
1 | 0 | 1 | 1 | 0 | 0 |
2 | 1 | 1 | 1 | 1 | 0 |
3 | 0 | 0 | 0 | 1 | 1 |
4 | 1 | 0 | 1 | 1 | 0 |
Let's calculate the frequency of each item and sort them in descending order by frequency:
Item | Frequency |
pads | 3 |
helmet | 3 |
bat | 2 |
wicket | 2 |
ball | 1 |
Now let's rearrange the transaction-based data based on the frequency:
ID | Original Items | Reordered Items |
t1 | Wickets, pads | Pads, wickets |
t2 | Bat, wickets, pads, helmet | Helmet, pads, wickets, bat |
t3 | Helmet, ball | Helmet, ball |
t4 | Bat, pads, helmet | Helmet, pads, bat |
To build the FP-tree, let's start with the first branch of the FP-tree. The FP-tree starts with a Null as the root. To build the tree, we can represent each item with a node, as shown in the following diagram (the tree representation of t1 is shown here). Note that the label of each node is the name of the item and its frequency is appended after the colon. Also, note that the pads item has a frequency of 1:
Using the same pattern, let's draw all four transactions, resulting in the full FP-tree. The FP-tree has four leaf nodes, each representing the itemset associated with the four transactions. Note that we need to count the frequencies of each item and need to increase it when used multiple times—for example, when adding t2 to the FP-tree, the frequency of helmet was increased to two. Similarly, while adding t4, it was increased again to three. The resulting tree is shown in the following diagram:
Note that the FP-tree generated in the preceding diagram is an ordered tree.