Nearest neighbor using R

For this example, we are using the housing data from ics.edu. First, we load the data and assign column names:

housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data") 
colnames(housing) <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PRATIO", "B", "LSTAT", "MDEV") 
summary(housing) 

We reorder the data so the key (the housing price MDEV) is in ascending order:

housing <- housing[order(housing$MDEV),] 

Now, we can split the data into a training set and a test set:

#install.packages("caret") 
library(caret) 
set.seed(5557) 
indices <- createDataPartition(housing$MDEV, p=0.75, list=FALSE) 
training <- housing[indices,] 
testing <- housing[-indices,] 
nrow(training) 
nrow(testing) 
381 
125 

We build our nearest neighbor model using both sets:

library(class) 
knnModel <- knn(train=training, test=testing, cl=training$MDEV) 
knnModel 
10.5 9.7 7 6.3 13.1 16.3 16.1 13.3 13.3... 

Let us look at the results:

plot(knnModel) 

There is a slight Poisson distribution with the higher points near the left side. I think this makes sense as natural data. The start and end tails are dramatically going off page.

What about the accuracy of this model? I did not find a clean way to translate the predicted factors in the knnModel to numeric values, so I extracted them to a flat file, and then loaded them in separately:

predicted <- read.table("housing-knn-predicted.csv") 
colnames(predicted) <- c("predicted") 
predicted
predicted
10.5
9.7
7.0

 

Then we can build up a results data frame:

results <- data.frame(testing$MDEV, predicted) 

And compute our accuracy:

results["accuracy"] <- results['testing.MDEV'] / results['predicted'] 
head(results) 
mean(results$accuracy) 
1.01794816307793
testing.MDEV predicted accuracy
5.6 10.5 0.5333333
7.2 9.7 0.7422680
8.1 7.0 1.1571429
8.5 6.3 1.3492063
10.5 13.1 0.8015267
10.8 16.3 0.6625767

So, we are estimating within 2% (1.01) of our testing data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.200.197