Remember, a successful analyst must not be guided by the methodology but instead think about the business problem, and only thereafter about the methodology that could provide the answers. Most practitioners of analysis/modeling think of the time series methodology as one where the final modeling dataset has a variable with a date or date-time stamp. But there could be other instances where time-stamped data might be transformed to conduct an analysis. Remember, transforming does not mean that the time element isn't important to the analysis methodology or insights generated. We will use this transformation for both market basket analysis (MBA) and clustering in this chapter. Let's look at data transformation in MBA to provide some context.
MBA's goal is to find common purchase items across customer's basket. It is also at times referred to as affinity or association analysis. This analysis is widely used in the retail sector but has found its takers in other domains as well. The most widespread example used to introduce this methodology is of a supermarket that discovered a shopping pattern. In their data analysis, they found that male customers who bought diapers were also likely to buy beer. While this is an odd combination, the retailer decided to use this insight to manage their shelf space and ended up placing beer close to diapers. This analysis also supports sequential mapping. We can not only try to ascertain which products are common in customers' baskets, we can also ascertain in which order they were purchased. Financial institutions can also benefit from this sort of analysis.
In the following example, we have the product purchase history of two customers in time series format. Both customers have been using various products and services. MBA can be conducted using the Transaction date variable. However, look at the transformed data without the date variable. Each row represents the purchase history of the customer. Let's try and use the transformed data in the chapter to conduct MBA:
Custid |
Transaction date |
Product |
1 |
Jan 5, 2016 |
Credit Card |
1 |
Feb 8, 2016 |
Personal Loan |
2 |
Apr 1, 2016 |
Current Account |
2 |
Aug 29, 2016 |
Savings Account |
1 |
Sep 4, 2016 |
Insurance |
2 |
Sep 30, 2016 |
Mortgage |
1 |
Nov 5 2016 |
Mortgage |
1 |
Jan 13, 2017 |
Savings Account |
1 |
Apr 7, 2017 |
Current Account |
2 |
Apr 7, 2017 |
Credit Card |
The preceding data has been transformed into a wider dataset. We have also removed the Transaction date variable:
Custid |
Product 1 |
Product 2 |
Product 3 |
Product 4 |
Product 5 |
Product 6 |
1 |
Credit Card |
Personal Loan |
Insurance |
Mortgage |
Savings Account |
Current Account |
2 |
Current Account |
Savings Account |
Mortgage |
Credit Card |