Random forest R examples

Again, this is where R shines (and also is dangerous), as it can do a lot of the tuning for you. We will show some example code to demonstrate what a few lines of R code can do. But the author recommends careful testing and data exploration before settling on the parameters and methods.

This code takes the imputed values from the mice section and predicts if the temperature is warmer than the average (which we categorize as hot) or cooler than the average (cold), based on the values for Wind, Ozone, and Solar radiation:

#Add classes to the imputed data based on Temperature.
#This is what will be the target variable for the random forest model
cs_example_data$tempClass <- ifelse(cs_example_data$Temp > 0,"hot","cold")

#make sure needed packages are installed, then load them
if(!require(randomForest)){
install.packages("randomForest")
}

library(randomForest)
library(caret)

#define how we are going to train the model
ctrlCV <- trainControl(method = "repeatedcv", number =10, repeats=5, returnResamp='none')

#define the target variable
target <- "tempClass"

#define the predictor features
predictors <- c("Ozone","Solar.R","Wind")

#split data into training and test
training <- createDataPartition(cs_example_data$tempClass, p=0.7, list=FALSE)
trainData <- cs_example_data[training,]
testData <- cs_example_data[-training,]

#train the random forest model and specify the number of trees. Use caret to control cross-validation
rfModel <- train (trainData[,predictors],
trainData[,target],
method = "rf",
trControl = ctrlCV)

#run prediction on test data to get class probabilities
testPredRFProb <- predict(rfModel, testData, type = "prob")
#run prediction again to get predicted class
testData$RFclass <- predict(rfModel, testData)

#grab the positive class probability (hot) and the predicted classes
testData$RFProb <- testPredRFProb[,"hot"]

#Show confusion matrix for results
confusionMatrix(data = testData$RFclass, reference = testData$tempClass, positive = "hot")

The resulting confusion matrix is shown next. This is a much smaller sample size than we would like, and the model performance is not all that great. But it does have some predictive value. It is not half bad for a simple example:

Resulting confusion matrix summary from random forest modeling.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.214.32