Model estimation

Once the feature sets get finalized, in our last section, what follows is the estimating of parameters of the selected models, for which we can use MLlib on the Zeppelin notebook.

Similar to what we did before, for the best modeling, we need to arrange distributed computing, especially for this case, with various student segments for various study subjects. For this distributed computing part, readers may refer to previous chapters as we will not repeat them here.

Spark implementation with the Zeppelin notebook

With MLlib for SCALA code for random forest, we will use the following code:

// Train a RandomForest model.
val treeStrategy = Strategy.defaultStrategy("Classification")
val numTrees = 300
val featureSubsetStrategy = "auto" // Let the algorithm choose.
val model = RandomForest.trainClassifier(trainingData,
  treeStrategy, numTrees, featureSubsetStrategy, seed = 12345)

For decision tree, we will execute the following code:

val model = DecisionTree.trainClassifier(trainingData, numClasses,
  categoricalFeaturesInfo, impurity, maxDepth, maxBins)

In MLlib, for linear regression, we will run the following code:

val numIterations = 90
val model = LinearRegressionWithSGD.train(TrainingData, numIterations)

For logistic regression, we will use the following code:

val model = new LogisticRegressionWithSGD()
.setNumClasses(2)

To get all them implemented, we need to first input all the preceding codes into our Zeppelin notebook and then complete their computing over there.

In other words, we need to input the codes described before into the Zeppelin notebook, as follows:

Spark implementation with the Zeppelin notebook

Then we can press Shift + Enter to run these commands and then obtain results similar to the following screenshot:

Spark implementation with the Zeppelin notebook
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.237.29