Nearest neighbor using R

For this example, we are using the housing data from ics.edu. First, we load the data and assign column names:

housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data") 
colnames(housing) <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PRATIO", "B", "LSTAT", "MDEV") 
summary(housing)

We reorder the data so the key (the housing price MDEV) is in ascending order:

housing <- housing[order(housing$MDEV),]

Now, we can split the data into a training set and a test set:

#install.packages("caret") 
library(caret) 
set.seed(5557) 
indices <- createDataPartition(housing$MDEV, p=0.75, list=FALSE) 
training <- housing[indices,] 
testing <- housing[-indices,] 
nrow(training) 
nrow(testing) 
381 
125

We build our nearest neighbor model using both sets:

library(class) 
knnModel <- knn(train=training, test=testing, cl=training$MDEV) 
knnModel 
10.5 9.7 7 6.3 13.1 16.3 16.1 13.3 13.3...

Let us look at the results:

plot(knnModel)

There is a slight Poisson distribution with the higher points near the left side. I think this makes sense as natural data. The start and end tails are dramatically going off page.

What about the accuracy of this model? I did not find a clean way to translate the predicted factors in the knnModel to numeric values, so I extracted them to a flat file, and then loaded them in separately:

predicted <- read.table("housing-knn-predicted.csv") 
colnames(predicted) <- c("predicted") 
predicted

predicted

10.5

9.7

7.0

Then we can build up a results data frame:

results <- data.frame(testing$MDEV, predicted)

And compute our accuracy:

results["accuracy"] <- results['testing.MDEV'] / results['predicted'] 
head(results) 
mean(results$accuracy) 
1.01794816307793

testing.MDEV	predicted	accuracy
`5.6`	`10.5`	`0.5333333`
`7.2`	`9.7`	`0.7422680`
`8.1`	`7.0`	`1.1571429`
`8.5`	`6.3`	`1.3492063`
`10.5`	`13.1`	`0.8015267`
`10.8`	`16.3`	`0.6625767`

So, we are estimating within 2% (1.01) of our testing data.

Table of Contents for Nearest neighbor using R

Create new playlist

Sign In

Sign Up

Table of Contents for
Nearest neighbor using R