How it works...

In Step 1, we have the probably familiar train and test set generation step we discussed in the previous recipes. Briefly, here, we create a vector of row numbers to use as a training set and use subsetting and negative subsetting to extract to new sub-datasets.

In Step 2, we proceed to create the model using the svm() function. The first argument is an R formula that specifies the column to use as the classes (the response variable, Species), and after ~, we use the . character to mean that all other columns are to be used as the data from which to build the model. We set the data argument to the train_set dataframe and select appropriate values for the kernel and gamma type. type may be classification- or regression-based; kernel is one of a variety of functions that are designed for different data and problems; and gamma is a parameter for the kernel. You may wish to check the function documentation for details. These values can also be optimized empirically.

In Step 3, we create some objects that we can use to render the four-dimensional boundary in two dimensions. First, we select the columns we don't want to plot (those to hold constant), then we use the lapply() function to iterate over a character vector of those column names and apply a function to calculate the mean of the named column. We add column names to the resultant list in the cols_to_hold variable. We then use the generic plot() function, passing the model, the training data to plot, the two dimensions to plot as a formula (Petal.Width ~ Petal.Length), and a slice argument that takes our means from the other columns in the held_constant list.

The result looks like this, showing the margins in colors for each class:

In Step 4, we repeat the predictions on the test set using predict() and generate the confusion matrix with caret::confusionMatrix() to see the accuracy.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...