Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Results explanation

After we passed our model evaluation stage and decided to select the estimated and evaluated model as our final model, our next task is to interpret results to the company executives and technicians.

Here, we will work on results explanation with a focus on large influencing variables.

Big influencers and their impacts

As we briefly discussed before, quality and freshness are very different for each dataset. Each data has its own weakness, as summarized in the following:

Category	Weakness
Web Log	incomplete
Account	old
Computer device	incomplete
User	old
Business	Incomplete and old

Due to the preceding issues, we often do not have enough data to score each transaction or score it with good accuracy, and we can only score it later. Because of this, the company hopes to identify some special signals or insights that can be used to take action quickly and easily.

The following briefly summarizes some of the result samples that we use some functions from randomForest and decision tree to produce.

With the randomForest package in R, a simple code of estimatedModel$importance will return a ranking of variables by their importance in determining frauds.

Tables for Impact Assessment:

Feature	Impacts
Click speed	1
Account	2
ComputerDevice	3

Here, obtaining variable importance through the randomForest functions needs a full model estimated and will complete all data. So, it does not really solve our problems.

What customers really needed is actually to use a partial set of available features to estimate a model with limited variables and then assess how good this partial model is, which is to tell the fraud catching and false positive ratio. To complete this task, Apache Spark's advantage of fast computing is utilized, which helps get results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Results explanation

Create new playlist

Sign In

Sign Up

Results explanation

Big influencers and their impacts

Table of Contents for
Results explanation