To better understand consumer needs, we need to understand that our customers have distinct consumer patterns. Each mass of consumers of a given product or service can be divided into segments, described in terms of age, marital status, purchasing power, and so on. In this chapter, we will be performing an exploratory analysis of consumer data from a grocery store and then applying clustering techniques to separate them into segments with homogenous consumer patterns. This knowledge will enable us to better understand their needs, create unique offers, and target them more effectively. In this chapter, we will learn about the following topics:
Let us see the requirements to understand the steps and follow the chapter.
To be able to follow the steps in this chapter, you will need to meet the following requirements:
Customer segmentation is the practice of classifying customers into groups based on shared traits so that businesses may effectively and appropriately market to each group. In business-to-business (B2B) marketing, a firm may divide its clientele into several groups based on a variety of criteria, such as location, industry, the number of employees, and previous purchases of the company’s goods.
Businesses frequently divide their clientele into segments based on demographics such as age, gender, marital status, location (urban, suburban, or rural), and life stage (single, married, divorced, empty nester, retired). Customer segmentation calls for a business to collect data about its customers, evaluate it, and look for trends that may be utilized to establish segments.
Job title, location, and products purchased—for example—are some of the details that can be learned from purchasing data to help businesses to learn about their customers. Some of this information might be discovered by looking at the customer’s system entry. An online marketer using an opt-in email list may divide marketing communications into various categories based on the opt-in offer that drew the client, for instance. However, other data—for example, consumer demographics such as age and marital status—will have to be gathered through different methods.
Other typical information-gathering methods in consumer goods include:
All organizations, regardless of size, industry, and whether they sell online or in person, can use customer segmentation. It starts with obtaining and evaluating data and concludes with taking suitable and efficient action on the information acquired.
We will execute an unsupervised clustering of data on the customer records from a grocery store’s database in this chapter. To maximize the value of each customer to the firm, we will segment our customer base to alter products in response to specific needs and consumer behavior. The ability to address the needs of various clientele also benefits the firm.
The first stage to understanding customer segments is to understand the data that we will be using. The first stage is, then, an exploration of the data to check the variables we must work with, handle non-structured data, and adjust data types. We will be structuring the data for the clustering analysis and gaining knowledge about the data distribution.
For the analysis we will use in the next example, the following Python modules are used:
We’ll now get started with the analysis, using the following steps:
from sklearn.cluster import AgglomerativeClustering
pd.options.display.precision = 2
data.head()
This results in the following output:
Figure 8.1: User data
data.describe()
This results in the following output:
Figure 8.2: Descriptive statistical summary
data.info()
This results in the following output:
Figure 8.3: Column data types and null values
From the preceding output shown with the describe and info methods of Pandas DataFrames, we can see the following:
print("Data Shape", data.shape)
data["Dt_Customer"] = pd.to_datetime(data["Dt_Customer"])
>>>> ('2012-01-08 00:00:00', '2014-12-06 00:00:00')
data["Customer_For"] = data["Customer_For"].dt.days
Figure 8.4: Marital status
Here, we can see that there are several types of marital status, which may have been caused by free text entry during the data capturing. We will have to standardize these values.
Figure 8.5: Education values
Again, we can see the effects of free text entry as there are several values that have the same underlying meaning; thus, we will need to standardize them as well.
In the next section, we will apply feature engineering to structure the data for better understanding and treatment of the data.
To be able to properly analyze the data as well as to model the clusters, we will need to clean and structure the data—a step that is commonly referred to as feature engineering—as we need to restructure some of the variables according to our plan of analysis.
In this section, we will be performing the next steps to clean and structure some of the dataset features, with the goal of simplifying the existing variables and creating features that are easier to understand and describe the data properly:
So, let’s apply the steps mentioned here to structure the data:
data["Age"] = pd.to_datetime('today').year -
data["Year_Birth"]
prod_cols = ["MntWines","MntFruits","MntMeatProducts",
data["Spent"] = data[prod_cols].sum(axis=1)
"Widow":"Alone",
data["Living_With"] = data["Marital_Status"].replace(marital_status_dict)
data["Children"]=data["Kidhome"]+data["Teenhome"]
data["Family_Size"] = data["Living_With"].replace({"Alone": 1, "Partner":2})+ data["Children"]
data["Is_Parent"] = (data.Children> 0).astype(int)
edu_dict = {"Basic":"Undergraduate","2n Cycle":"Undergraduate", "Graduation":"Graduate", "Master":"Postgraduate", "PhD":"Postgraduate"}
data["Ed_level"]=data["Education"].replace(edu_dict)
"MntFruits":"Fruits",
data = data.rename(columns=col_rename_dict)
to_drop = ["Marital_Status", "Dt_Customer",
data.describe()
Figure 8.6: Age data
We can see that there are some outliers, more than 120 years old, so we will be removing those.
Figure 8.7: Income data
Again, we can see that most incomes are below 20,000, so we will be limiting the spending level.
prev_len = len(data)
print('Removed outliers:',new_len)
The preceding code prints the next output:
>>> Removed outliers: 11
Figure 8.8: Age with no outliers
The age is centered on the 50s, with a skew to the right, meaning that the average age of our customers is above 45 years.
Figure 8.9: Income with no outliers
Looking at the spend distribution, it has a normal distribution, centered on 4,000 and slightly skewed to the left.
Figure 8.10: Relationship plot
These graphics allow us to quickly observe relationships between the different variables, as well as their distribution. One of the clearest is the relationship between spend and income, in which we can see that the higher the income, the higher the expenditure, as well as observing that single parents spend more than people who are not. We can also see that the consumers with higher recency are parents, while single consumers have lower recency values. Next, let us look at the correlation among the features (excluding the categorical attributes at this point).
df_corr = data.corr()
Figure 8.11: Variable correlation
The correlations allow us to explore the variable relationships in more detail. We can see negative correlations between children and expenditure in the mean, while there are positive relationships between children and recency. These correlations allow us to better understand consumption patterns.
In the next section, we will use the concept of clustering to segment the clients into groups that share common characteristics.
Marketers can better target different audience subgroups with their marketing efforts by segmenting their audiences. Both product development and communications might be a part of those efforts. Segmentation benefits a business by allowing the following:
In this section, we will be preprocessing the data to be able to apply clustering methods for customer segmentation. The steps that we will apply to preprocess the data are set out here:
So, let’s follow the steps here:
print("Categorical variables in the dataset:", object_cols)
for i in object_cols:
data[i]=data[[i]].apply(LE.fit_transform)
scaled_ds = data.copy()
cols_del = ['AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1','AcceptedCmp2', 'Complain', 'Response']
scaled_ds = scaled_ds.drop(cols_del, axis=1)
scaler.fit(scaled_ds)
scaled_ds = pd.DataFrame(scaler.transform(
scaled_ds),columns= scaled_ds.columns )
There are numerous attributes in this dataset that describe the data. The more features there are, the more difficult it is to correctly analyze them in a business environment. Many of these characteristics are redundant since they are connected. Therefore, before running the features through a classifier, we will conduct dimensionality reduction on the chosen features.
Dimensionality reduction is the process of reducing the number of random variables considered. To reduce the dimensionality of huge datasets, a technique known as PCA is frequently utilized. PCA works by condensing an ample collection of variables into a smaller set that still retains much of the data in the larger set.
Accuracy naturally suffers as a dataset’s variables are reduced, but the answer to dimensionality reduction is to trade a little accuracy for simplicity since ML algorithms can analyze data much more quickly and easily with smaller datasets because there are fewer unnecessary factors to process. In conclusion, the basic principle of PCA is to keep as much information as possible while reducing the number of variables in the data collected.
The steps that we will be applying in this section are the following:
This will allow us to have a way to visualize the segments projected into three dimensions. In an ideal setup, we will use the weights of each component to understand what each component represents and make sense of the information we are visualizing in a better way. For reasons of simplicity, we will focus on the visualization of the components. Here are the steps:
pca = PCA(n_components=3)
PCA_ds = pd.DataFrame(PCA_ds, columns=([
"component_one","component_two", "component_three"]))
>>>>[0.35092717 0.12336458 0.06470715]
For this project, we will reduce the dimensions to three, which manages to explain the 54% total variance in the observed variables:
print('Total explained variance',sum(pca.explained_variance_ratio_)) >>>> Total explained variance 0.5389989029179605
x,y,z=PCA_ds["component_one"],PCA_ds[
plt.show()
The preceding code will show us the dimensions projected in three dimensions:
Figure 8.12: PCA variables in 3D
Since the attributes are now only three dimensions, agglomerative clustering will be used to perform the clustering. A hierarchical clustering technique is agglomerative clustering. Up until the appropriate number of clusters is reached, examples are merged.
The process of clustering involves grouping the population or data points into a number of groups so that the data points within each group are more like one another than the data points within other groups. Simply put, the goal is to sort into clusters any groups of people who share similar characteristics. Finding unique groups, or “clusters”, within a data collection is the aim of clustering. The tool uses an ML algorithm to construct groups, where members of a group would typically share similar traits.
Two methods of pattern recognition used in ML are classification and clustering. Although there are some parallels between the two processes, clustering discovers similarities between things and groups them according to those features that set them apart from other groups of objects, whereas classification employs predetermined classes to which objects are assigned. “Clusters” is the name for these collections.
The steps involved in clustering are set out here:
In K-means clustering, the ideal number of clusters is established using the elbow approach. The number of clusters, or K, formed by various values of the cost function are plotted using the elbow approac:.
fig = plt.figure(figsize=(12,8))
elbow = KElbowVisualizer(KMeans(), k=(2,12), metric='distortion') # distortion: mean sum of squared distances to centers
elbow.fit(PCA_ds)
elbow.show()
This code will plot an elbow plot, which will be a good estimation of the required number of clusters:
Figure 8.13: Elbow method
yhat_AC = AC.fit_predict(PCA_ds)
PCA_ds["Clusters"] = yhat_AC
data["Clusters"]= yhat_AC
values = PCA_ds["Clusters"]
ax = plt.subplot(projection='3d')
plt.legend(handles=scatter.legend_elements()[0], labels=classes)
The preceding code will show a three-dimensional visualization of the PCA components colored according to the clusters:
Figure 8.14: PCA variables with cluster labeling
From this, we can see that each cluster occupies a specific space in the visualization. We will now dive into a description of each cluster to better understand these segments.
To rigorously evaluate the output obtained, we need to evaluate the depicted clusters. This is because clustering is an unsupervised method and the patterns extracted should always reflect reality, otherwise; we might just as well be analyzing noise.
Common traits among consumer groups can help a business choose which items or services to advertise to which segments and how to market to each one.
To do that, we will use exploratory data analysis (EDA) to look at the data in the context of clusters and make judgments. Here are the steps:
Figure 8.15: Cluster count
The clusters are fairly distributed with a predominance of cluster 0. It can be clearly seen that cluster 1 is our biggest set of customers, closely followed by cluster 0.
f, ax = plt.subplots(figsize=(12, 8))
pl = sns.scatterplot(data = data,x=data["Spent"], y=data["Income"],hue=data["Clusters"], palette= colors)
plt.legend()
plt.show()
Figure 8.16: Income versus spending
In the income versus spending plot, we can see the next cluster patterns:
sample = data.sample(750)
Figure 8.17: Spend distribution per cluster
From Figure 8.17, it can be seen how the spend is evenly distributed in cluster 0, cluster 1 is centered on high expenditure, and clusters 2 and 3 center on low expenditure.
f, ax = plt.subplots(figsize=(12, 6))
Figure 8.18: Spend distribution per cluster (Boxen plot)
We can visualize the patterns in a different way using a Boxen plot.
data["TotalProm"] = data["AcceptedCmp1"]+ data["AcceptedCmp2"]+ data["AcceptedCmp3"]+ data["AcceptedCmp4"]+ data["AcceptedCmp5"]
f, ax = plt.subplots(figsize=(10, 6))
pl = sns.countplot(x=data["TotalProm "],hue=data["Clusters"], palette= ['red','blue','green','orange'])
pl.set_ylabel("Count")
Figure 8.19: Promotions applied per cluster
We can see that although there is no characteristic pattern in the promotions per cluster, we can see that cluster 0 and cluster 2 are the ones with the highest number of applied promotions.
f, ax = plt.subplots(figsize=(12, 6))
pl = sns.boxenplot(y=data["NumDealsPurchases"],x=data["Clusters"], palette= ['red','blue','green','orange'])
pl.set_title("Purchased Deals")
Figure 8.20: Purchased deals per cluster
Promotional campaigns failed to be widespread, but the transactions were successful. The results from clusters 0 and 2 are the best. Cluster 1, one of our top clients, is not interested in the promotions, though. Nothing draws cluster 1 in a strong way.
Now that the clusters have been created and their purchasing patterns have been examined, let us look at everyone in these clusters. To determine who is our star customer and who requires further attention from the retail store’s marketing team, we will profile the clusters that have been developed.
Considering the cluster characterization, we will graph some of the elements that are indicative of the customer’s personal traits. We will draw conclusions based on the results.
Figure 8.21: Spend versus education distribution per cluster
Cluster 0 is centered on medium education but with a peak in high education. Cluster 2 is the lowest in terms of education.
Figure 8.22: Spend versus family size distribution per cluster
Cluster 1 represents small family sizes, and cluster 0 represents couples and families. Clusters 2 and 3 are evenly distributed.
Figure 8.23: Spend versus customer distribution per cluster
Cluster 3 is the group with older clients. While it is interesting to see that although cluster 0 is the one with the highest spending, it is skewed to the left in terms of days since the user has been a customer.
sns.jointplot(x=data['Age'], y=data["Spent"], hue =data["Clusters"], kind="kde", palette=['red','blue','green','orange'],height=10)
Figure 8.24: Spend versus age distribution per cluster
Cluster 0 is the one with older customers, and the one with the youngest clients is cluster 2.
In this chapter, we have performed unsupervised clustering. After dimensionality reduction, agglomerative clustering was used. To better profile customers in clusters based on their family structures, income, and spending habits, we divided users into four clusters. This can be applied while creating more effective marketing plans.
In the next chapter, we will dive into the prediction of sales using time-series data to be able to determine revenue expectations given a set of historical sales, as well as understand their relationship with other variables.
18.218.70.93