Deploying fraud detection

As discussed before, MLlib supports model exporting to Predictive Model Markup Language (PMML). For the R notebook, it could run on other environments as well as, and with the PMML R package, R models could be exported. Also, it is possible to deploy models for decision making directly on Apache Spark and make results easily available to users. Therefore, we do export some developed models to PMML for this project.

However, in practice, the users of this project will be more interested in rule-based decision making to use some of our insights and also in score-based decision making to prevent frauds.

Here, we will discuss each one of them only briefly as a full deployment for decision making will need an optimization that is not covered in this chapter.

Turning estimated models into rules and scores is not very challenging and could be done under nonSpark platforms. However, Apache Spark makes things easy and fast. The advantage of utilizing Apache Spark is to allow us to quickly produce new rules and scores when data and customer requirements get changed.

Rules

As discussed before, for R results, there are several tools to help extract rules out from developed predictive models.

For the decision tree model we developed, we should use the rpart.utils R package, which can extract rules and export them in various formats, such as RODBC.

The rpart.rules.table(model1) * package returns an unpivoted table of variable values (factor levels) associated with each branch.

However, for this project, partially due to the issue of data incompleteness, we will need to utilize some insights to derive rules directly. That is, we need to use insights discussed in the last section. For example, we can do the following:

  • If the online click speed is dramatically different from the past, contact the user by phone
  • If the bank account is not a real bank account or just a debit card or the bank account is very new, some actions are needed

From an analytical perspective, we face the same issue here to minimize false positives while catching enough frauds.

The company had a high false positive ratio from their past rules, and as a result of this, too many alerts were sent out that became a burden for manual inspection and also caused a lot of customer complaints.

Therefore, by taking advantage of Spark's fast computing, we carefully produced rules and, for each rule, we supplied false positive ratios that helped the company utilize the rules.

Scoring

From the coefficients of our predictive models, we can derive a suspicious score for fraud, but that takes some work.

In R, model$predicted will return the case class as FRAUD or NOT. However, prob=predict(model,x,type="prob") will produce a probability value, which can be used directly as a score.

However, in order to use the score, we need to select a cutting-out score. For example, we can decide to take actions when the suspicious score is over 80.

Different score cutting points will produce different fraud positive ratios and also the ratios of catching frauds; for this, users need to make a decision about how to balance the results here.

By taking advantage of Spark's fast computing, results can be calculated quickly, which allows the company to select cutting points instantly and make changes any time when needed.

Another way to deal with this issue is to use the OptimalCutpoints R package.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.177.39