A further example – breast cancer classification using SVM with TensorFlow

So far, we have been using scikit-learn to implement SVMs. Let's now look at how to do so with TensorFlow. Note that, up until now (the end of 2018), the only SVM API provided in TensorFlow is with linear kernel for binary classification.

We are using the breast cancer dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)) as an example. Its feature space is 30-dimensional, and its target variable is binary. Let's see how it's done by performing the following steps:

  1. First, import the requisite modules and load the dataset as well as check its class distribution:
>>> import tensorflow as tf
>>> from sklearn import datasets
>>> cancer_data = datasets.load_breast_cancer()
>>> X = cancer_data.data
>>> Y = cancer_data.target
>>> print(Counter(Y))
Counter({1: 357, 0: 212})
  1. Split the data into training and testing sets as follows:
>>> np.random.seed(42)
>>> train_indices = np.random.choice(len(Y), round(len(Y) * 0.8), replace=False)
>>> test_indices = np.array(list(set(range(len(Y))) - set(train_indices)))
>>> X_train = X[train_indices]
>>> X_test = X[test_indices]
>>> Y_train = Y[train_indices]
>>> Y_test = Y[test_indices]
  1. Now, initialize the SVM classifier as follows:
>>> svm_tf = tf.contrib.learn.SVM(
feature_columns=(tf.contrib.layers.real_valued_column(column_name='x'),),
example_id_column='example_id')
  1. Then, we construct the input function for training data, before calling the fit method:
>>> input_fn_train = tf.estimator.inputs.numpy_input_fn(
... x={'x': X_train,
'example_id': np.array(['%d' % i for i in
range(len(Y_train))])},
... y=Y_train,
... num_epochs=None,
... batch_size=100,
... shuffle=True)

The example_id is something different to scikit-learn. It is basically a placeholder for the id of samples.

  1. Fit the model on the training set as follows:
>>> svm_tf.fit(input_fn=input_fn_train, max_steps=100)
  1. Evaluate the classification accuracy on the training set as follows:
>>> metrics = svm_tf.evaluate(input_fn=input_fn_train, steps=1)
>>> print('The training accuracy is:
{0:.1f}%'.format(metrics['accuracy']*100))
The training accuracy is: 94.0%
  1. To predict on the testing set, we construct the input function for testing data in a similar way:
>>> input_fn_test = tf.estimator.inputs.numpy_input_fn(
... x={'x': X_test,
'example_id': np.array(
['%d' % (i +
len(Y_train)) for i in range(len(X_test))])},
... y=Y_test,
... num_epochs=None,
... shuffle=False)
  1. Finally, evaluate its classification accuracy as follows:
>>> metrics = svm_tf.evaluate(input_fn=input_fn_test, steps=1)
>>> print('The testing accuracy is:
{0:.1f}%'.format(metrics['accuracy']*100))
The testing accuracy is: 90.6%

Note, you will get different results every time you run the codes. This is because, for the underlying optimization of the tf.contrib.learn.SVM module, the Stochastic Dual Coordinate Ascent (SDCA) method is used, which incorporates inevitable randomness.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.101.81