How it works...

In Step 1, we perform a similar dataset split to those in several previous recipes. Using the sample() function, we create a list of 80% of the row numbers of the original iris data and then, using subsetting and negative subsetting, we extract the rows.

In Step 2, we train the model using the randomForest() function. The first argument here is a formula; we're specifying that Species is the value we wish to predict based on all other variables, which are described by data is our train_set object. The key in this recipe is to make sure we set the importance variable to TRUE, meaning the model will test variables that, when left out of the model building, cause the biggest decrease in accuracy. Once the model is built and tested, we can visualize the importance of each variable with the varImpPlot() function. In doing so, we get the following diagram:

We can see that it is the Petal.Width and Petal.Length variables that, when left out, cause the greatest decrease in model accuracy, so are, by this measure, the most important.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.100.31