Results explanation

After we passed our model evaluation stage and decided to select the estimated and evaluated model as our final model, our next task is to interpret results to the company executives and technicians.

Here, we will work on results explanation with a focus on large influencing variables.

Big influencers and their impacts

As we briefly discussed before, quality and freshness are very different for each dataset. Each data has its own weakness, as summarized in the following:

Category

Weakness

Web Log

incomplete

Account

old

Computer device

incomplete

User

old

Business

Incomplete and old

Due to the preceding issues, we often do not have enough data to score each transaction or score it with good accuracy, and we can only score it later. Because of this, the company hopes to identify some special signals or insights that can be used to take action quickly and easily.

The following briefly summarizes some of the result samples that we use some functions from randomForest and decision tree to produce.

With the randomForest package in R, a simple code of estimatedModel$importance will return a ranking of variables by their importance in determining frauds.

Tables for Impact Assessment:

Feature

Impacts

Click speed

1

Account

2

ComputerDevice

3

Here, obtaining variable importance through the randomForest functions needs a full model estimated and will complete all data. So, it does not really solve our problems.

What customers really needed is actually to use a partial set of available features to estimate a model with limited variables and then assess how good this partial model is, which is to tell the fraud catching and false positive ratio. To complete this task, Apache Spark's advantage of fast computing is utilized, which helps get results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.249.252