Attribute information

For each entry in the dataset, the following is provided:

Triaxial acceleration from the accelerometer and the approximate acceleration of the body
Triaxial angular velocity from the gyroscope
Time and frequency domain variables with 561-feature vector
Various labels of activity
An identifier of the subject who was observed

By referring to the following steps, you will learn how to build a multi-class classification using SVMs:

Let's quickly import all the necessary libraries that you will need in order to implement an SVM with multi-class classification:

In [1]: import numpy as np
...     import pandas as pd
...     import matplotlib.pyplot as plt 
...     %matplotlib inline
...     from sklearn.utils import shuffle
...     from sklearn.svm import SVC
...     from sklearn.model_selection import cross_val_score, GridSearchCV

Next, you will be loading the dataset. Since we are supposed to run this code from a Jupyter Notebook in the notebooks/ directory, the relative path to the data directory is simply data/:

In [2]: datadir = "data"
...     dataset = "multiclass"
...     train = shuffle(pd.read_csv("data/dataset/train.csv"))
...     test = shuffle(pd.read_csv("data/dataset/test.csv"))

Let's check whether there are any missing values in the training and testing dataset; if there are any, then we will simply drop them from the dataset:

In [3]: train.isnull().values.any()
Out[3]: False
In [4]: test.isnull().values.any()
Out[4]: False

Next, we will find the frequency distribution of the classes in the data, which means that we will check how many samples belong to each of the six classes:

In [5]: train_outcome = pd.crosstab(index=train["Activity"], # Make a crosstab
 columns="count") # Name the count column
... train_outcome

From the following screenshot, you can observe that the LAYING class has the most samples, but overall, the data is approximately equally distributed and there are no major signs of class imbalance:

Next, we will separate out the predictors (input values) and outcome values (class labels) from the train and test datasets:

In [6]: X_train = pd.DataFrame(train.drop(['Activity','subject'],axis=1))
...     Y_train_label = train.Activity.values.astype(object)
...     X_test = pd.DataFrame(test.drop(['Activity','subject'],axis=1)) 
...     Y_test_label = test.Activity.values.astype(object)

Since the SVM expects numerical input and labels, you will now transform the non-numerical labels into numerical labels. But first, we will have to import a preprocessing module from the sklearn library:

In [7]: from sklearn import preprocessing
... encoder = preprocessing.LabelEncoder()

Now, we will encode the train and test labels into numerical values:

In [8]: encoder.fit(Y_train_label)
...     Y_train = encoder.transform(Y_train_label)
...     encoder.fit(Y_test_label)
...     Y_test = encoder.transform(Y_test_label)

Next, we will scale (normalise) the train and test feature set and for this, you will import StandardScaler from sklearn:

In [9]: from sklearn.preprocessing import StandardScaler
...     scaler = StandardScaler()
...     X_train_scaled = scaler.fit_transform(X_train)
...     X_test_scaled = scaler.transform(X_test)

Once the data is scaled and the labels are in a correct format, now is the time when we will fit the data. But before that, we will define a dictionary with the different parameter settings that the SVM will use while training itself, and this technique is called GridSearchCV. The parameter grid will be based on the results of a random search:

In [10]: params_grid = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

Finally, we will call GridSearchCV on the data using the preceding parameters for the best SVM fit:

In [11]: svm_model = GridSearchCV(SVC(), params_grid, cv=5)
...      svm_model.fit(X_train_scaled, Y_train)

It's time to check how well the SVM model was trained on the data; in short, we will find the accuracy. Not only that, but we will also check what the parameter settings were for which SVM performed the best:

In [12]: print('Best score for training data:', svm_model.best_score_,"
") 
...      print('Best C:',svm_model.best_estimator_.C,"
") 
...      print('Best Kernel:',svm_model.best_estimator_.kernel,"
")
...      print('Best Gamma:',svm_model.best_estimator_.gamma,"
")
Out[12]: Best score for training data: 0.986
...      Best C: 100
...      Best Kerne: rbf
...      Best Gamma: 0.001

Voila! As we can see, the SVM achieved 98.6% accuracy on the training data on a multi-class classification problem. But hold down your horses until we find the accuracy on the test data. So, let's quickly check that:

In [13]: final_model = svm_model.best_estimator_
... print("Training set score for SVM: %f" % final_model.score(X_train_scaled , Y_train))
... print("Testing set score for SVM: %f" % final_model.score(X_test_scaled , Y_test ))
Out[13]: Training set score for SVM: 1.00
... Testing set score for SVM: 0.9586

Wow! Isn't that amazing? We were able to achieve 95.86% accuracy on the testing set; that's the power of SVMs.

Table of Contents for Attribute information

Create new playlist

Sign In

Sign Up

Table of Contents for
Attribute information