How to do it...

Identifying the most important variables in data with random forests can be done using the following steps:

  1. Prepare the training and test data:
library(randomForest)

train_rows <- sample(nrow(iris), 0.8 * nrow(iris), replace = FALSE)
train_set <- iris[train_rows, ]
test_set <- iris[-train_rows, ]
  1. Train the model and create the importance plot:
model <- randomForest(Species ~ . , data = train_set, mtry = 2, importance = TRUE)
varImpPlot(model)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.249.77