4.4. Association Rules

Mining association rules consist of searching groups of objects that appear together in certain contexts. This task is performed by means of association rules analysis.18 Market basket analysis, in which you want to find out which products are most frequently sold together in one basket, is as a classic example.19 Let us consider a simple example of association rules analysis by looking at the following example of four market baskets. Each line in Table 4.3 represents a market basket registered on the receipts in a cash register.20 You can use these receipts to analyze the frequency of certain pairs of products. In Table 4.4, the frequency of the simultaneous presence of two products for all possible pairs is shown. Table 4.4 gives the following information about sales:

  • Sales of juice are very good because this product appears in every single basket.
  • Juice is relatively often sold together with water and beer.
  • Milk is never sold together with beer or water.

Table 4.3. Example of a Set of Receipts

Source: author.

Table 4.4. Frequency of the Simultaneous Presence of Pairs of Products in the Market Basket

Source: author.

Such conclusions would be well grounded if the sample of the market basket had an adequately high statistical significance. The essence of mining association rules during market basket research lies in formulating principles such as “if one puts product A into a basket, then it is highly plausible that he or she will put product B into a basket as well.” The accuracy of such a principle in the context of a conviction that such an event took place is described by two parameters: support and confidence:21

  • Support describes how often products A and B appear together in the basket against all transactions.
  • Confidence describes the conditional probability of putting product B into the basket if product A has been put there earlier.

Let us, for instance, consider the following rule: “If there is juice in a basket, then water is also in a basket.” For this principle with shopping, as in Table 4.3,

  • support = (number of baskets with juice and water) ÷ (number of all baskets) = 2 ÷ 4 (50%),
  • confidence = (number of baskets with juice and water) ÷ (number of all baskets with juice) = 2 ÷ 4 (50%).

Let us also consider the opposite principle to understand things more clearly: “If there is water in a basket, then juice is also in a basket.” Notice that support for this rule will be exactly the same, but confidence will be higher:

  • confidence = (number of baskets with juice and water) ÷ (number of all baskets with water) = 2 ÷ 2 (100%).

This means that (for a given sample of baskets) putting water into a basket always implies the purchase of juice.22 Figure 4.8 shows all possible combinations of rules and their parameters: support and confidence.

Figure 4.8. Example of association principles.

Source: author.

Association rules can be presented graphically, as shown in Figure 4.9. Each product is represented by a circle, and its size corresponds to the frequency of the presence of a given product in the baskets. The vector between the two circles represents the association rule, and its thickness is connected with the parameters of the evaluation of the rule.

Knowledge about connections between sales of products can be used for

  • arrangement of products on store shelves,
  • promotional packages,
  • recommendation of upselling,
  • comparative research of stores and taking into account the factor of time in carrying out research (mining sequence patterns).23

Obviously, these applications go beyond the environment of chain stores and can be useful in other business areas.24 It is particularly worth stressing that they can support upselling: You can analyze a customer basket in time and consequently suggest rational solutions.

4.4.1. Limitations

There are two vital limitations of both cluster analysis and association rules analysis that are of great importance for other data-mining methods as well: the problems of interpretation and computational complexity.

Figure 4.9. Example of graphical representation of association rules.

4.4.1.1. The Problem of Interpretation

In general, rules derived from association rules algorithms only slightly contribute to knowledge that was not previously known and would simultaneously be useful for business purposes. In most cases, those rules are trivial and do not go beyond our current knowledge. There is also a separate group of rules in which interpretation is problematic in business areas. These rules are therefore usually omitted because of their incidental and unclear character. Similarly, the interpretation of classification rules is also dubious. Another problem is business interpretation of segments generated automatically by cluster analysis algorithms. The proper identification (interpretation) of important segments and the rejection of those that are not useful in business require the analyst to have a profound knowledge of the market. In fact, it is a problem that concerns other data-mining algorithms and is a real limitation to its business applications.

4.4.1.2. The Problem of Computational Complexity

You can easily imagine how much data must be processed by the algorithm based on receipts from a chain of stores. There are millions of records even in a relatively short time period. The number of product indexes in an average chain usually reaches a few thousand. The issue becomes even more complicated if you take into account not pairs of products but three or more product items. The complexity of the analysis increases sharply, which carries with it unacceptable times of generating the rules.25 This problem applies to most algorithms for discovering knowledge and is one of the basic limitations of these approaches.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.97.75