Interest RateModel

The second model predicts the interest rate of accepted loans. In this case, we will use only the part of the training data that corresponds to good loans, since they have assigned a proper interest rate. However, we need to understand that the remaining bad loans could carry useful information related to the interest rate prediction.

As in the rest of the cases, we will start with the preparation of training data. We will use initial data, filter out bad loans, and drop string columns:

val intRateDfSplits = loanStatusDfSplits.map(df => {
df
.where("loan_status == 'good loan'")
.drop("emp_title", "desc", "loan_status")
.withColumn("int_rate", toNumericRateUdf(col("int_rate")))
})
val trainIRHf = toHf(intRateDfSplits(0), "trainIRHf")(h2oContext)
val validIRHf = toHf(intRateDfSplits(1), "validIRHf")(h2oContext)

In the next step, we will use the capabilities of H2O random hyperspace search to find the best GBM model in a defined hyperspace of parameters. We will also constrain the search by additional stopping criteria based on the requested model precision and overall search time.

The first step is to define common GBM model builder parameters, such as training, validation datasets, and response column:

import _root_.hex.tree.gbm.GBMModel.GBMParameters
val intRateModelParam = let(new GBMParameters()) { p =>
p._train = trainIRHf._key
p._valid = validIRHf._key
p._response_column = "int_rate"
p._score_tree_interval = 20
}

The next step involves definition of hyperspace of parameters to explore. We can encode any interesting values, but keep in mind that the search could use any combination of parameters, even those that are useless:

import _root_.hex.grid.{GridSearch}
import water.Key
import scala.collection.JavaConversions._
val intRateHyperSpace: java.util.Map[String, Array[Object]] = Map[String, Array[AnyRef]](
"_ntrees" -> (1 to 10).map(v => Int.box(100*v)).toArray,
"_max_depth" -> (2 to 7).map(Int.box).toArray,
"_learn_rate" ->Array(0.1, 0.01).map(Double.box),
"_col_sample_rate" ->Array(0.3, 0.7, 1.0).map(Double.box),
"_learn_rate_annealing" ->Array(0.8, 0.9, 0.95, 1.0).map(Double.box)
)

Now, we will define how to traverse the defined hyperspace of parameters. H2O provides two strategies: a simple cartesian search that step-by-step builds the model for each parameter's combination or a random search that randomly picks the parameters from the defined hyperspace. Surprisingly, the random search has quite a good performance, especially if it is used to explore a huge parameter space:

import _root_.hex.grid.HyperSpaceSearchCriteria.RandomDiscreteValueSearchCriteria
val intRateHyperSpaceCriteria = let(new RandomDiscreteValueSearchCriteria) { c =>
c.set_stopping_metric(StoppingMetric.RMSE)
c.set_stopping_tolerance(0.1)
c.set_stopping_rounds(1)
c.set_max_runtime_secs(4 * 60 /* seconds */)
}

In this case, we will also limit the search by two stopping conditions: the model performance based on RMSE and the maximum runtime of the whole grid search. At this point, we have defined all the necessary inputs, and it is time to launch the hyper search:

val intRateGrid = GridSearch.startGridSearch(Key.make("intRateGridModel"),
intRateModelParam,
intRateHyperSpace,
new GridSearch.SimpleParametersBuilderFactory[GBMParameters],
intRateHyperSpaceCriteria).get()

The result of the search is a set of models called grid. Let's find one with the lowest RMSE:

val intRateModel = intRateGrid.getModels.minBy(_._output._validation_metrics.rmse())
println(intRateModel._output._validation_metrics)

The output is as follows:

Here, we can define our evaluation criteria and select the right model not only based on selected model metrics, but also consider the term and difference between predicted and actual value, and optimize the profit. However, instead of that, we will trust our search strategy that it found the best possible model and directly jump into deploying our solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.220.201