How it works...

The whole of Step 1 is the preparation of training and test sets. We use the sample() function to select from a vector of 1 to the number of rows in iris; we select 80% of the row numbers without a replacement so that train_rows is a vector of integers giving the rows from iris, which we will use in our training set. In the rest of this step, we use subsetting and negative subsetting to prepare the subsets of iris we will need.

In Step 2, we proceed directly to build a model we make predictions with. The randomForest() function takes, at its first argument, an R formula naming the column to be predicted (in other words, Species, the response variable), and the dataframe columns to use as training data—here, we use all columns, which we express as a . character. The data argument is the name of the source dataframe and the mtry argument is a tunable parameter that tells the algorithm how many splits to use. The best value of this is usually around the square root of the number of columns, but optimizing it can be helpful. The resulting model is saved in a variable called model, which can be printed for inspection.

At Step 3, we use the predict() function with model, the test_set data, and the type argument set to class to predict the classes of the test set. We then assess them with caret::confusionMatrix() to give the following result:

##             Reference
## Prediction   setosa versicolor virginica
##   setosa         13          0         0
##   versicolor      0          8         0
##   virginica       0          0        9
##

The result indicates that the test set was classified perfectly.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...