Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Results explanation

After we have passed our model evaluation stage and decided to select the estimated and evaluated model as our final model, our next task is to interpret results to the university leaders and technicians.

In terms of explaining the machine learning results, the university is particularly interested in, firstly, understanding how their designed interventions affect student attrition, and, secondly, among the common reasons of finances, academic performance, social/emotional encouragement, and personal adjustment, which has the biggest impact.

We will work on results explanation with our focus on big influencing variables in the following sections.

Calculating the impact of interventions

The following summarizes some of the result samples briefly, for which we can use some functions from randomForest and decision tree to produce.

With Spark 1.5, you can use the following code to obtain a vector of feature importance:

val importances: Vector = model.featureImportances

With the randomForest package in R, a simple code of estimatedModel$importance will return a ranking of variables by their importance in determining attrition.

The table for impact assessment for interventions is as follows:

Feature	Impacts
Teacher interaction	1
Financial aid	2
Study grouping	3
…

Here, to obtain variable importance through the randomForest functions, we need a full model estimated with all the data complete. So, it does not really solve our problems.

What learning organizations really need is to actually use a partial set of available features to estimate a model with limited variables and then assess how good this partial model is, which is to say how good the attrition catching and false positive ratios are. To complete this task, Apache Spark's advantage of fast computing is utilized, which helps us get results.

Calculating the impact of main causes

As we briefly discussed in the Feature preparation section, the main predictors selected can be summarized with the following table:

Category	Number of factors	Factor names
Academic performance	4	AF1, AF2, AF3, AF4
Financial status	2	F1, F2
Emotional encouragement 1	2	EE1_1, EE!_2
Emotional encouragement 2	2	EE2_1, EE2_2
Personal adjustment	3	PA1, PA2, PA3
Study patterns	3	SP1, SP2, SP3
Total	16

The university leaders are interested in learning how these features cause attrition, for which we can perform what was described in the previous section. That is, we need to apply the code used to obtain feature importance to the preceding features to rank their importance.

As for logistic regression results, we can also apply the Prob(Yi=1) = exp(BXi)/(1+exp(BXi)) equation to obtain the impact of each feature at a certain point.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Results explanation

Create new playlist

Sign In

Sign Up

Results explanation

Calculating the impact of interventions

Calculating the impact of main causes

Table of Contents for
Results explanation