After we have passed our model evaluation stage and decided to select the estimated and evaluated model as our final model, our next task is to interpret results to the university leaders and technicians.
In terms of explaining the machine learning results, the university is particularly interested in, firstly, understanding how their designed interventions affect student attrition, and, secondly, among the common reasons of finances, academic performance, social/emotional encouragement, and personal adjustment, which has the biggest impact.
We will work on results explanation with our focus on big influencing variables in the following sections.
The following summarizes some of the result samples briefly, for which we can use some functions from randomForest
and decision tree to produce.
With Spark 1.5, you can use the following code to obtain a vector of feature importance:
val importances: Vector = model.featureImportances
With the randomForest
package in R, a simple code of estimatedModel$importance
will return a ranking of variables by their importance in determining attrition.
The table for impact assessment for interventions is as follows:
Feature |
Impacts |
---|---|
Teacher interaction |
1 |
Financial aid |
2 |
Study grouping |
3 |
… |
Here, to obtain variable importance through the randomForest
functions, we need a full model estimated with all the data complete. So, it does not really solve our problems.
What learning organizations really need is to actually use a partial set of available features to estimate a model with limited variables and then assess how good this partial model is, which is to say how good the attrition catching and false positive ratios are. To complete this task, Apache Spark's advantage of fast computing is utilized, which helps us get results.
As we briefly discussed in the Feature preparation section, the main predictors selected can be summarized with the following table:
Category |
Number of factors |
Factor names |
---|---|---|
Academic performance |
4 |
AF1, AF2, AF3, AF4 |
Financial status |
2 |
F1, F2 |
Emotional encouragement 1 |
2 |
EE1_1, EE!_2 |
Emotional encouragement 2 |
2 |
EE2_1, EE2_2 |
Personal adjustment |
3 |
PA1, PA2, PA3 |
Study patterns |
3 |
SP1, SP2, SP3 |
Total |
16 |
The university leaders are interested in learning how these features cause attrition, for which we can perform what was described in the previous section. That is, we need to apply the code used to obtain feature importance to the preceding features to rank their importance.
As for logistic regression results, we can also apply the Prob(Yi=1) = exp(BXi)/(1+exp(BXi)) equation to obtain the impact of each feature at a certain point.
18.189.188.238