Deployment

As discussed before, MLlib supports model export to Predictive Model Markup Language (PMML). Therefore, we export some developed models to PMML for this project as some other departments of the university are interested in our analytical results and use other systems such as SPSS.

However, for practical purposes, the users of this project are more interested in rule-based decision making to use some of our insights and also in score-based decision making to reduce student attrition.

Specifically, as for this project, the client is interested in applying our results to, firstly, decide which interventions to use for a combination of course adjustments or counseling services with a special student segment, and, secondly, when the university needs to start some interventions as per the student attrition score.

Therefore, we need to turn some of our results into rules and also produce a student attrition risk score for this university.

Rules

All the algorithms either in MLlib or R can produce trees directly so that users may use these trees to derive rules directly.

Also, as discussed before, for R results, there are several tools to help extract rules from developed predictive models.

For the decision tree model developed, we should use the rpart.utils R package, which can extract rules and export them in various formats, such as RODBC.

The rpart.rules.table(model1) returns an unpivoted table of variable values (factor levels) associated with each branch, that is, sub rules to be used.

However, for this project, partially due to the issue of data incompleteness, it is better for us to use some insight into deriving rules directly. That is, we should use the insight discussed in last section. For example, we can do the following:

  • If academic performance is decreasing dramatically, we can contactthe teacher
  • If the student's social network score is below a certain level and academic performance is also changing dramatically (even now at low scores), some actions are needed

From an analytical perspective, one of the main issues here is to minimize the false positive while catching enough attritions.

The university had a high false positive ratio from using their past rules, and as a result of this, too many alerts were sent out, adding a big burden for manual inspection. Therefore, by taking advantage of Spark's fast computing, we carefully produced rules, and for each rule, we supplied false positive ratios that helped the university use these rules as well as provide useful feedback.

Scoring

From coefficients of our predictive models, we can derive a probability score for attrition, but this takes some work.

Using the following MLlib code, we can obtain probability scores quickly:

// Compute raw scores on the test set.
val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
  val prediction = model.predict(features)
  (prediction, label)
}

The preceding code returns labels, but for binary classification, you can use the LogisticRegressionModel.clearThreshold method. After it is called, predict will return raw scores:

Scoring

Unlike the labels mentioned before, these are in the [0, 1] range and can be interpreted as probabilities.

Using R, model$predicted will return the case class as ATTRITION or NOT. However, prob=predict(model,x,type="prob") will produce a probability value, which can be used directly as a score.

However, in order to use the score, we need to select a cutting out score. For example, we can choose to take action when the attrition probability score is over 80.

Different score cutting points will produce different false positive ratios and also the ratios of catching possible attrition, for which the users need to make a decision about how to balance the results.

By taking advantage of Spark's fast computing, results can be calculated fast, which allows the university to select a cutting point instantly and make changes whenever needed.

Another way to deal with this issue is to use the OptimalCutpoints R package.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.149.238