Results explanation

After we have passed our model evaluation stage and decided to select the estimated and evaluated model as our final model, our next task is to interpret results to the university leaders and technicians.

In terms of explaining the machine learning results, the university is particularly interested in, firstly, understanding how their designed interventions affect student attrition, and, secondly, among the common reasons of finances, academic performance, social/emotional encouragement, and personal adjustment, which has the biggest impact.

We will work on results explanation with our focus on big influencing variables in the following sections.

Calculating the impact of interventions

The following summarizes some of the result samples briefly, for which we can use some functions from randomForest and decision tree to produce.

With Spark 1.5, you can use the following code to obtain a vector of feature importance:

val importances: Vector = model.featureImportances

With the randomForest package in R, a simple code of estimatedModel$importance will return a ranking of variables by their importance in determining attrition.

The table for impact assessment for interventions is as follows:

Feature

Impacts

Teacher interaction

1

Financial aid

2

Study grouping

3

 

Here, to obtain variable importance through the randomForest functions, we need a full model estimated with all the data complete. So, it does not really solve our problems.

What learning organizations really need is to actually use a partial set of available features to estimate a model with limited variables and then assess how good this partial model is, which is to say how good the attrition catching and false positive ratios are. To complete this task, Apache Spark's advantage of fast computing is utilized, which helps us get results.

Calculating the impact of main causes

As we briefly discussed in the Feature preparation section, the main predictors selected can be summarized with the following table:

Category

Number of factors

Factor names

Academic performance

4

AF1, AF2, AF3, AF4

Financial status

2

F1, F2

Emotional encouragement 1

2

EE1_1, EE!_2

Emotional encouragement 2

2

EE2_1, EE2_2

Personal adjustment

3

PA1, PA2, PA3

Study patterns

3

SP1, SP2, SP3

Total

16

 

The university leaders are interested in learning how these features cause attrition, for which we can perform what was described in the previous section. That is, we need to apply the code used to obtain feature importance to the preceding features to rank their importance.

As for logistic regression results, we can also apply the Prob(Yi=1) = exp(BXi)/(1+exp(BXi)) equation to obtain the impact of each feature at a certain point.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.188.238