Summary

In this chapter, we went through a step-by-step process, from big data to a rapid development of fraud detection systems from which we processed data on Spark and then built several models to predict frauds. With this, we then developed rules and scores to help the ABC company prevent frauds.

Specifically, we first selected a supervised machine learning approach with a focus on Random forest and decision trees as per business needs, after we prepared Spark computing and loaded preprocessed data. Second, we worked on feature extraction and selection. Third, we estimated model coefficients. Fourth, we evaluated these estimated models using a confusion matrix and false positive ratios. Then, we interpreted our machine learning results. Finally, we deployed our machine learning results, with a focus on scoring but also used insights to develop rules.

The preceding process is similar to the process of working with small data. However, in dealing with big data, we need parallel computing, which Apache Spark is utilized for. Also, during the process described before, Apache Spark makes things easy and fast so that we are able to solve a few difficult problems, such as incomplete data. This means that we could take advantage of Apache Spark's fast computing to meet ABC Corporation's special analytical needs.

After this chapter, you will have gained a full understanding of how Apache Spark can be utilized to make our work easier and faster in conducting supervised machine learning, and developing fraud detection systems. Also, you now understand how fast computing can turn into analytical capabilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.31.163