Implementing a simple grid search

Returning to our kNN classifier, we find that we have only one hyperparameter to tune: k. Typically, you would have a much larger number of open parameters to mess with, but the kNN algorithm is simple enough for us to manually implement a grid search.

Before we get started, we need to split the dataset as we have done before into training and test sets:

  1. Here we choose a 75-25 split:
In [1]: from sklearn.datasets import load_iris
... import numpy as np
... iris = load_iris()
... X = iris.data.astype(np.float32)
... y = iris.target
In [2]: X_train, X_test, y_train, y_test = train_test_split(
... X, y, random_state=37
... )
  1. Then, the goal is to loop over all possible values of k. As we do this, we want to keep track of the best accuracy we observed, as well as the value for k that gave rise to this result:
In [3]: best_acc = 0.0
... best_k = 0
  1. The grid search then looks like an outer loop around the entire train and test procedure:
In [4]: import cv2
... from sklearn.metrics import accuracy_score
... for k in range(1, 20):
... knn = cv2.ml.KNearest_create()
... knn.setDefaultK(k)
... knn.train(X_train, cv2.ml.ROW_SAMPLE, y_train)
... _, y_test_hat = knn.predict(X_test)

After calculating the accuracy on the test set (acc), we compare it to the best accuracy found so far (best_acc).

  1. If the new value is better, we update our bookkeeping variables and move on to the next iteration:
...         acc = accuracy_score(y_test, y_test_hat)
... if acc > best_acc:
... best_acc = acc
... best_k = k
  1. When we are done, we can have a look at the best accuracy:
In [5]: best_acc, best_k
Out[5]: (0.97368421052631582, 1)

It turns out, we can get 97.4% accuracy using k = 1.

With more variables, we would naturally extend this procedure by wrapping our code in a nested for loop. However, as you can imagine, this can quickly become computationally expensive.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.196.244