Results explanation

Per our 4Es framework used for this book, after we passed our model evaluation stage and selected the estimated and evaluated models as our final models, our next task for this project is to interpret the results to our clients.

In terms of explaining the machine learning results, the users of our project are particularly interested in understanding what influences the known rankings that are widely used. Also, they are interested in how new rankings are different from others and how the new rankings can be used.

So, we will work on their requests, but will not cover all of them as the purpose here is mainly to exhibit technologies. Also, for the confidentiality issue and also space limitations, we will not go into the details too much, but will focus more on utilizing our technologies for better explanations.

Overall, the interpretation is straightforward here, which include the following three tasks:

  • Present a list of top-ranked schools and school districts
  • Compare various lists
  • Explain the impact of factors such as parent involvement and economy on the rankings

One of the main achievements of this project is for us to obtain a better and more accurate ranking with our ensemble methods as well as good analytics, but it is very challenging to explain it the users, and it is also beyond the scope of this book here.

Another big improvement achieved here is the capability for us to quickly produce rankings per various requirements, such as to rank per academic performance or per future employment or per graduation rate, which is interesting to users, but seems still take time for adoption. However, users understand the benefits of fast-producing rankings, as made possible using Apache Spark.

So, as a result, we have delivered a few lists, and reported on ranking comparison and on factors influencing rankings.

Comparing ranks

R has some packages that help us analyze and compare rankings, such as pmr and Rmallow. However, for this project, the users preferred simple comparison, such as a direct comparison of the top 10 schools and the top 10 school districts, which made our explanation a little easier.

Another task of the explanatory works is to compare our list to others, such as the one at http://www.school-ratings.com/schoolRatings.php?zipOrCity=91030, or the one provided by the LA Times at http://schools.latimes.com/, or the one by SchoolIE. They claimed to be using big data to evaluate schools from many perspectives, rather than by one angle, at http://www.schoolie.com/.

As a result, we found ours to be closer to the one created by SchoolIE.

R has some algorithms to compute similarity or distance between rankings, which we explored, but have not used to serve the clients. This is because we adopted an approach with simple comparison that our clients preferred, and it is still very effective.

Biggest influencers

As people are interested in how some schools are on top and other schools are not, our results about the biggest predictors are of great interest.

For this part, we use results from our estimated predictive models of regression, for which we have used our own rankings as the target variable, and also some well-known rankings such as those provided by the US News and World Report and those by some state organizations.

For this task, we have just used the coefficients in our linear regression models to tell us which one has a bigger impact. We also used the RandomForest function to rank features per their impact on moving schools into the top 100. In other words, we split the list into "top 100" and "the rest." We then ran the decision tree modeling and random forest modeling on it, and then used the Random Forest's feature importance function to obtain a list of features as ordered by their impact on the target variable of whether the school is in top 100. In R, we need to use the function of importance in R's randomForest package.

Per our results, the economic status of the community, parents' involvements, and college connections are among the factors having the biggest impact for some coast schools. However, technology use has not had as much impact as expected.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.58.194