Summary

This chapter constitutes an extension of what was described and discussed in the previous chapters (Chapter 3, A Holistic View on Spark to Chapter 9, City Analytics on Spark). Here, we took an approach driven by data and analytical needs rather than driven by predefined projects. We also developed some predictive models to score subscribers on customer churn, on Call Center calling probabilities, and even on purchasing propensity.

In this chapter, using a real-life project of learning from telco data, we have gone through a step-by-step process of utilizing big data to serve the telco company as well as their clients, from which we processed a large amount of data on Apache Spark. We then built several models, including regression and decision tree, to predict customer churn and Call Center calls and also purchasing, with which we then developed rules for alerts and also developed scores to help the telco company and its clients. At the same time, we completed some exploratory analytics by taking advantage of the Apache Spark fast computation.

Specifically, we first selected two supervised machine learning approaches after we prepared Spark computing and loaded in preprocessed data. Second, we worked on data and feature preparation by merging a few datasets together and further developing features. We then selected a core set of features for model building. Third, we estimated model coefficients by directly using MLlib and R notebooks on Databricks as well as SPSS. Fourth, we evaluated these estimated models, mainly using RMSEs and error ratios. Then, we interpreted our machine learning results with a focus on special insights and biggest predictors. Finally, we deployed our machine learning results by developing a few scores, but also used insights to develop rules for sending alerts.

This process is similar to the ones used in previous chapters. However, here, we take a more dynamic approach that we have used descriptive statistics and visualization for data exploratory work, and then work between SPSS and R and MLlib dynamically, as well as jump between the 4Es as needed.

After reading this chapter, you would have gained a better understanding about how Apache Spark could be used with MLlib, R, and SPSS to perform productive machine learning.

Specially, after reading this chapter, you will reach a new level of utilizing machine learning in a dynamic way to solve problems. That is, users are not limited to linearly progressing step by step to get a project done, but will go back and forth to achieve optimal results. They will also jump between MLlib, SPSS, R, and other tools to achieve the best analytical solutions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.255.168