Results explanation

After we passed our model-evaluation stage and decided to select the estimated and evaluated model as our final models, our next task is to interpret the results to the telco company and their clients.

In terms of explaining the machine learning results, the telco company is particularly interested in understanding what influences the Call Center call volume as well as what impacts the subscriber churn. Of course, they are also open to other special insights.

We will work on these tasks, with our focus on big influencing features and some special insights.

Descriptive statistics and visualizations

With R or SPSS on Spark, as well as MLlib in place, one advantage is to obtain analytical results fast. So, quickly, we have obtained the following insights as summarized by the following tables.

For subscriber churn, we have the following two tables that summarize the subscriber churn ratios, per their phone manufactures and per our market segments of six main categories. Producing some customer segmentations is another task performed for the telco company, but we will not discuss this in detail as per the limitation of this book. You may just consider this as one of the taken features. The following table shows the subscriber churn ratios per phone manufactures:

:MFTR

Churn Rate

A

.09

H

.11

L

.11

M

.12

N

.08

R

.10

S

.10

The following table shows the subscriber churn ratios per market segment:

Segment

Churn Rate

DG1

.09

DG2

.05

HB

.13

NS

.10

NP

.10

UN

.10

For Call Center calls, we have the following two tables that summarize the average calls made by subscribers, per their phone manufactures and per our market segments. The following table shows the average calls made by the subscribers per phone manufactures:

MFTR

Average Call Center Calls

A

1.26

H

1.11

L

0.89

M

1.03

N

0.88

R

1.30

S

1.00

The following table shows the average calls made by the subscribers per market segment:

Segment

Average Call Center Calls

DG1

1.13

DG2

2.86

HB

0.50

NS

2.31

NP

1.12

UN

1.52

Furthermore, our results also map out stores per churn rate or per Call Center calls. Here is an example:

Descriptive statistics and visualizations

For this mapping task, we have used the R code.

The following is an example of R codes used to visualize store distribution:

library(maps)
library(mapdata)
library(maptools)
library(scales)

map("worldHires", "usa", xlim=c(-120, -70), ylim=c(25, 55), col="gray95", fill=TRUE)
points(lon, lat, pch=19, col="red", cex=1)

Biggest influencers

In terms of finding out the features with the largest impact on the target features of subscriber churn and Call Center calls, once our Spark computing is up, we can easily utilize algorithms for randomForest. Then, as we saw in Chapter 8, Learning Analytics on Spark, the randomForest algorithm can give us a list of all the features per their impact on the target variable and with nice visualization graphs.

However, for this project, with Call Center calls as a good target feature variable with continuous values, the linear regression results give us the insights directly. In other words, the features with larger coefficients in the linear regression have a larger impact on the target feature. Another way of assessing predictors is to use the associated R squared, which we also used when we conducted feature selection. In other words, this task may be performed together with the feature selection work as described in the Data and feature development section.

However, for impact on subscriber churns, we have used randomForest results, for which we have the following list of the five largest predictors in order:

  • Call Center calls
  • Quality of services
  • Usage
  • Manufacturer
  • Customer segments

With the preceding results, it is easy and also not surprising to see the impact of Call Center calls as the biggest, which also indicates to the telco company about where they need to intervene to reduce subscriber churns.

For Call Center calls, we have the following list of the four largest predictors in order:

  • Quality of services
  • Usage
  • Manufacturer
  • Segments

Per the preceding results, the main drivers of Call Center calls are service quality and call usage, with actually the interaction of these two that needs to be further explored.

Special insights

As we see from the preceding section, quality of services has a very big impact on both customer churn and also on Call Center calls.

Therefore, the client is very interested in learning more about the relationship between QoS and churn, for which we use R to visualize their relationship. We found that there is more customer churn in the middle values of QoS.

This result may reflect a non-linear relationship between the two, and in our opinion, this calls for more data on the location's social and economic characteristics and about competition for us to explore the relationship deeper.

Visualizing trends

With our Spark computing in place, a lot of visualization can be produced, especially with R in place. The following image is one example. Here, data transforming success ratios over a year has been plotted to show service quality changes over the course of the year:

Visualizing trends

The following image shows the SMS success rate in 2012:

Visualizing trends

For this work, we used the following R code:

library(lubridate)
Rtime<-ymd(day1)
plot(Rtime[event_type == "SMS"],
event_success_mean[event_type == "SMS"], col="red",
main="SMS Success Rate in 2012",
xlab="Sep to Dec 2012", ylab="Average SMS Success Rate"
lines(Rtime[event_type == "SMS"],
event_success_mean[event_type == "SMS"],
col="red", main="SMS Success Rate in 2012",
xlab="Sep to Dec 2012", ylab="Average SMS Success Rate")
plot(Rtime[event_type == "Data"],
event_success_mean[event_type == "Data"],
col="red", main="Data Success Rate in 2012",
xlab="Sep to Dec 2012", ylab="Average Data Success Rate")
lines(Rtime[event_type == "Data"],
event_success_mean[event_type == "Data"], col="red")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.184.200