Using the k-nearest neighbors algorithm

A simple technique that can be used to classify a set of observed values is the k-nearest neighbors (abbreviated as k-NN) algorithm. This algorithm is a form of lazy learning in which all the computation is deferred until classification. Also, in the classification phase, the k-NN algorithm approximates the class of the observed values using only a few values from the training data, and the reading of other values is deferred until they are actually needed.

While we now explore the k-NN algorithm in the context of classification, it can be applied to regression as well by simply selecting the predicted value as the average of the nearest values of the dependent variable for a set of observed feature values. Interestingly, this technique of modeling regression is, in fact, a generalization of linear interpolation (for more information, refer to An introduction to kernel and nearest-neighbor nonparametric regression).

The k-NN algorithm reads some training data and analyzes this data lazily, that is, only when needed. Apart from the training data, the algorithm requires a set of observed values and a constant k as parameters to classify the set of observed values. To classify these observed values, the algorithm predicts the class that is the most frequent among the k training samples nearest to the set of observed values. By nearest, we mean a point with the least Euclidean distance from the point that is represented by a set of observed values in the Euclidean space of the training data.

An obvious corollary is that when Using the k-nearest neighbors algorithm, the predicted class is the class of the single neighbor nearest to the set of observed values. This special case of the k-NN algorithm is called the nearest neighbor algorithm.

We can create a classifier that uses the k-NN algorithm using the clj-ml library's make-classifier function. Such a classifier is specified using the keywords :lazy and :ibk as arguments to the make-classifier function. We will now use such a classifier to model our previous example of a fish packaging plant, as follows:

(def K1-classifier (make-classifier :lazy :ibk))

(defn train-K1-classifier []
  (dataset-set-class fish-dataset 0)
  (classifier-train K1-classifier fish-dataset))

The preceding code defines a k-NN classifier as K1-classifier and a train-K1-classifier function to train the classifier with the training data using fish-dataset, which we defined in the preceding code.

Note that the make-classifier function defaults the constant k or rather the number of neighbors to 1, which implies a single nearest neighbor. We can optionally specify the constant k as a key-value pair with the:num-neighbors key to the make-classifier function as shown in the following code:

(def K10-classifier (make-classifier
                     :lazy :ibk {:num-neighbors 10}))

We can now call the train-K1-classifier function to train the classifier as follows:

user> (train-K1-classifier)
#<IBk IB1 instance-based classifier
using 1 nearest neighbour(s) for classification
>

We can now use the classifier-classify function to classify the fish represented by sample-fish, which we had defined earlier, using the classifier represented by the K1-classifier variable:

user> (classifier-classify K1-classifier sample-fish)
:salmon

As shown in the preceding code, the k-NN classifier predicts the fish class as salmon, thus agreeing with our earlier predictions that used a Bayes classifier. In conclusion, the clj-ml library provides a concise implementation of a classifier that uses the k-NN algorithm to predict the class of a set of observed values.

The k-NN classifier provided by the clj-ml library performs the normalization of the features of the classification model by default using the mean and standard deviation of the values of these features. We can specify an option to the make-classifier function to skip this normalization phase by passing a map entry with the:no-normalization key in the map of options passed to the make-classifier function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.118.229