Market basket analysis

Remember, a successful analyst must not be guided by the methodology but instead think about the business problem, and only thereafter about the methodology that could provide the answers. Most practitioners of analysis/modeling think of the time series methodology as one where the final modeling dataset has a variable with a date or date-time stamp. But there could be other instances where time-stamped data might be transformed to conduct an analysis. Remember, transforming does not mean that the time element isn't important to the analysis methodology or insights generated. We will use this transformation for both market basket analysis (MBA) and clustering in this chapter. Let's look at data transformation in MBA to provide some context.

MBA's goal is to find common purchase items across customer's basket. It is also at times referred to as affinity or association analysis. This analysis is widely used in the retail sector but has found its takers in other domains as well. The most widespread example used to introduce this methodology is of a supermarket that discovered a shopping pattern. In their data analysis, they found that male customers who bought diapers were also likely to buy beer. While this is an odd combination, the retailer decided to use this insight to manage their shelf space and ended up placing beer close to diapers. This analysis also supports sequential mapping. We can not only try to ascertain which products are common in customers' baskets, we can also ascertain in which order they were purchased. Financial institutions can also benefit from this sort of analysis.

In the following example, we have the product purchase history of two customers in time series format. Both customers have been using various products and services. MBA can be conducted using the Transaction date variable. However, look at the transformed data without the date variable. Each row represents the purchase history of the customer. Let's try and use the transformed data in the chapter to conduct MBA:

Custid

Transaction date

Product

1

Jan 5, 2016

Credit Card

1

Feb 8, 2016

Personal Loan

2

Apr 1, 2016

Current Account

2

Aug 29, 2016

Savings Account

1

Sep 4, 2016

Insurance

2

Sep 30, 2016

Mortgage

1

Nov 5 2016

Mortgage

1

Jan 13, 2017

Savings Account

1

Apr 7, 2017

Current Account

2

Apr 7, 2017

Credit Card

 

The preceding data has been transformed into a wider dataset. We have also removed the Transaction date variable:

Custid

Product 1

Product 2

Product 3

Product 4

Product 5

Product 6

1

Credit Card

Personal Loan

Insurance

Mortgage

Savings Account

Current Account

2

Current Account

Savings Account

Mortgage

Credit Card

Figure 7.1: Original and transformed data for MBA
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.128