4.4. Association Rules
Mining association rules consist of searching groups of objects that appear together in certain contexts. This task is performed by means of association rules analysis.18 Market basket analysis, in which you want to find out which products are most frequently sold together in one basket, is as a classic example.19 Let us consider a simple example of association rules analysis by looking at the following example of four market baskets. Each line in Table 4.3 represents a market basket registered on the receipts in a cash register.20 You can use these receipts to analyze the frequency of certain pairs of products. In Table 4.4, the frequency of the simultaneous presence of two products for all possible pairs is shown. Table 4.4 gives the following information about sales:
Table 4.3. Example of a Set of Receipts
Source: author.
Table 4.4. Frequency of the Simultaneous Presence of Pairs of Products in the Market Basket
Source: author.
Such conclusions would be well grounded if the sample of the market basket had an adequately high statistical significance. The essence of mining association rules during market basket research lies in formulating principles such as “if one puts product A into a basket, then it is highly plausible that he or she will put product B into a basket as well.” The accuracy of such a principle in the context of a conviction that such an event took place is described by two parameters: support and confidence:21
Let us, for instance, consider the following rule: “If there is juice in a basket, then water is also in a basket.” For this principle with shopping, as in Table 4.3,
Let us also consider the opposite principle to understand things more clearly: “If there is water in a basket, then juice is also in a basket.” Notice that support for this rule will be exactly the same, but confidence will be higher:
This means that (for a given sample of baskets) putting water into a basket always implies the purchase of juice.22 Figure 4.8 shows all possible combinations of rules and their parameters: support and confidence.
Figure 4.8. Example of association principles.
Source: author.
Association rules can be presented graphically, as shown in Figure 4.9. Each product is represented by a circle, and its size corresponds to the frequency of the presence of a given product in the baskets. The vector between the two circles represents the association rule, and its thickness is connected with the parameters of the evaluation of the rule.
Knowledge about connections between sales of products can be used for
Obviously, these applications go beyond the environment of chain stores and can be useful in other business areas.24 It is particularly worth stressing that they can support upselling: You can analyze a customer basket in time and consequently suggest rational solutions.
4.4.1. Limitations
There are two vital limitations of both cluster analysis and association rules analysis that are of great importance for other data-mining methods as well: the problems of interpretation and computational complexity.
Figure 4.9. Example of graphical representation of association rules.
4.4.1.1. The Problem of Interpretation
In general, rules derived from association rules algorithms only slightly contribute to knowledge that was not previously known and would simultaneously be useful for business purposes. In most cases, those rules are trivial and do not go beyond our current knowledge. There is also a separate group of rules in which interpretation is problematic in business areas. These rules are therefore usually omitted because of their incidental and unclear character. Similarly, the interpretation of classification rules is also dubious. Another problem is business interpretation of segments generated automatically by cluster analysis algorithms. The proper identification (interpretation) of important segments and the rejection of those that are not useful in business require the analyst to have a profound knowledge of the market. In fact, it is a problem that concerns other data-mining algorithms and is a real limitation to its business applications.
4.4.1.2. The Problem of Computational Complexity
You can easily imagine how much data must be processed by the algorithm based on receipts from a chain of stores. There are millions of records even in a relatively short time period. The number of product indexes in an average chain usually reaches a few thousand. The issue becomes even more complicated if you take into account not pairs of products but three or more product items. The complexity of the analysis increases sharply, which carries with it unacceptable times of generating the rules.25 This problem applies to most algorithms for discovering knowledge and is one of the basic limitations of these approaches.
18.218.97.75