Random forests in R

With R we include the packages we are going to use:

install.packages("randomForest", repos="http://cran.r-project.org") 
library(randomForest)

Load the data:

filename = "http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data" 
housing <- read.table(filename) 
colnames(housing) <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX",  
                       "RM", "AGE", "DIS", "RAD", "TAX", "PRATIO", 
                       "B", "LSTAT", "MDEV")

Split it up:

housing <- housing[order(housing$MDEV),] 
#install.packages("caret") 
library(caret) 
set.seed(5557) 
indices <- createDataPartition(housing$MDEV, p=0.75, list=FALSE) 
training <- housing[indices,] 
testing <- housing[-indices,] 
nrow(training) 
nrow(testing)

Calculate our model:

forestFit <- randomForest(MDEV ~ CRIM + ZN + INDUS + CHAS + NOX  
                  + RM + AGE + DIS + RAD + TAX + PRATIO  
                  + B + LSTAT, data=training) 
forestFit 
Call: 
 randomForest(formula = MDEV ~ CRIM + ZN + INDUS + CHAS + NOX +      RM + AGE + DIS + RAD + TAX + PRATIO + B + LSTAT, data = training)  
               Type of random forest: regression 
                     Number of trees: 500 
No. of variables tried at each split: 4 
 
          Mean of squared residuals: 11.16163 
                    % Var explained: 87.28

This is one of the more informative displays about a model—we see the model explains 87% of the variable.

Make our prediction:

forestPredict <- predict(forestFit, newdata=testing) 
See how well the model worked: 
diff <- forestPredict - testing$MDEV 
sum( (diff - mean(diff) )^2 ) #sum of squares 
1391.95553131418

This is one of the lowest sum of squares among the models we produced in this chapter.

Table of Contents for Random forests in R

Create new playlist

Sign In

Sign Up

Table of Contents for
Random forests in R