Random forests in R

With R we include the packages we are going to use:

install.packages("randomForest", repos="http://cran.r-project.org") 
library(randomForest) 

Load the data:

filename = "http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data" 
housing <- read.table(filename) 
colnames(housing) <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX",  
                       "RM", "AGE", "DIS", "RAD", "TAX", "PRATIO", 
                       "B", "LSTAT", "MDEV") 

Split it up:

housing <- housing[order(housing$MDEV),] 
#install.packages("caret") 
library(caret) 
set.seed(5557) 
indices <- createDataPartition(housing$MDEV, p=0.75, list=FALSE) 
training <- housing[indices,] 
testing <- housing[-indices,] 
nrow(training) 
nrow(testing) 

Calculate our model:

forestFit <- randomForest(MDEV ~ CRIM + ZN + INDUS + CHAS + NOX  
                  + RM + AGE + DIS + RAD + TAX + PRATIO  
                  + B + LSTAT, data=training) 
forestFit 
Call: 
 randomForest(formula = MDEV ~ CRIM + ZN + INDUS + CHAS + NOX +      RM + AGE + DIS + RAD + TAX + PRATIO + B + LSTAT, data = training)  
               Type of random forest: regression 
                     Number of trees: 500 
No. of variables tried at each split: 4 
 
          Mean of squared residuals: 11.16163 
                    % Var explained: 87.28 

This is one of the more informative displays about a model—we see the model explains 87% of the variable.

Make our prediction:

forestPredict <- predict(forestFit, newdata=testing) 
See how well the model worked: 
diff <- forestPredict - testing$MDEV 
sum( (diff - mean(diff) )^2 ) #sum of squares 
1391.95553131418 

This is one of the lowest sum of squares among the models we produced in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.72.224