Market basket analysis (MBA) is a technique used in data mining by retail companies to increase sales by better understanding customer buying patterns. It involves analyzing large datasets, such as customer purchase history, to uncover item groupings and products that are likely to be frequently purchased together.
This chapter explores the implementation of market basket analysis with the help of an open source e-commerce dataset. You start with the dataset in exploratory data analysis (EDA) and focus on critical insights. You then learn about the implementation of various techniques in MBA, plot a graphical representation of the associations, and draw insights.
Implementation
Data Collection
Let’s look at an open source dataset from a Kaggle e-commerce website. Download the dataset from www.kaggle.com/carrie1/ecommerce-data?select=data.csv.
Importing the Data as a DataFrame (pandas)
Cleaning the Data
The Quantity column has some negative values, which are part of the incorrect data, so let’s drop these entries.
Insights from the Dataset
Customer Insights
Who are my loyal customers?
Which customers have ordered most frequently?
Which customers contribute the most to my revenue?
Loyal Customers
Number of Orders per Customer
Let’s plot the orders by different customers.
Money Spent per Customer
Patterns Based on DateTime
In which month is the highest number of orders placed?
On which day of the week is the highest number of orders placed?
At what time of the day is the store the busiest?
Preprocessing the Data
How Many Orders Are Placed per Month?
How Many Orders Are Placed per Day?
Provide X tick labels.
How Many Orders Are Placed per Hour?
Free Items and Sales
Since the minimum unit price = 0, there are either incorrect entries or free items.
Items with UnitPrice = 0 are not outliers. These are the “free” items.
There is at least one free item every month except June 2011.
Provide X tick labels.
The greatest number of free items were given out in November 2011. The greatest number of orders were also placed in November 2011.
Compared to the May month, the sales for the month of August have declined, indicating a slight effect from the “number of free items”.
Item Insights
Which item was purchased by the greatest number of customers?
Which is the most sold item based on the sum of sales?
Which is the most sold item based on the count of orders?
What are the “first choice” items for the greatest number of invoices?
Most Sold Items Based on Quantity
Items Bought by the Highest Number of Customers
This means 856 customers ordered WHITE HANGING HEART T-LIGHT HOLDER.
Create a bar plot of description (or the item) on the y axis and the sum of unique customers on the x axis.
Most Frequently Ordered Items
Top Ten First Choices
Frequently Bought Together (MBA)
Which items are frequently bought together?
If a user buys an item X, which item is he/she likely to buy next?
Let’s use group by function to create a market basket DataFrame, which specifies if an item is present in a particular invoice number for all items and all invoices.
This output gets the quantity ordered (e.g., 48,24,126), but we just want to know if an item was purchased or not.
Apriori Algorithm Concepts
Refer to Chapter 1 for more information.
Suppose you are looking to build a relationship between milk and bread. If 7 out of 40 milk buyers also buy bread, then confidence = 7/40 = 17.5%
The basic formula is lift = confidence/support.
So here, lift = 17.5/10 = 1.75.
Association Rules
Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently an item set occurs in a transaction. A market basket analysis is performed based on the rules created from the dataset.
Figure 2-30 shows that out of the five transactions in which a mobile phone was purchased, three included a mobile screen guard. Thus, it should be recommended.
Implementation Using mlxtend
If A => then B
Use the apriori algorithm and create association rules for the sample item.
Creating a Function
Validation
There are some common items between the recommendations from the bought_together_frequently function and the invoice.
Thus, the recommender is performing well.
Visualization of Association Rules
Summary
In this chapter, you learned how to build a recommendation system based on market basket analysis. You also learned how to fetch items that are frequently purchased together and offer suggestions to users. Most e-commerce sites use this method to showcase items bought together. This chapter implemented this method in Python using an e-commerce example.