Building a classifier

Once sensor samples are represented as feature vectors having the class assigned, it is possible to apply standard techniques for supervised classification, including feature selection, feature discretization, model learning, k-fold cross validation, and so on. The chapter will not delve into the details of the machine learning algorithms. Any algorithm that supports numerical features can be applied, including SVMs, random forest, AdaBoost, decision trees, neural networks, multi-layer perceptrons, and others.

Therefore, let's start with a basic one, decision trees: load the dataset, build set class attribute, build a decision tree model, and output the model:

String databasePath = "/Users/bostjan/Dropbox/ML Java Book/book/datasets/chap9/features.arff";

// Load the data in arff format
Instances data = new Instances(new BufferedReader(new FileReader(databasePath)));

// Set class the last attribute as class
data.setClassIndex(data.numAttributes() - 1);

// Build a basic decision tree model
String[] options = new String[]{};
J48 model = new J48();
model.setOptions(options);
model.buildClassifier(data);

// Output decision tree
System.out.println("Decision tree model:
"+model);

The algorithm first outputs the model, as follows:

Decision tree model:
J48 pruned tree
------------------

max <= 10.353474
|   fft_coef_0000 <= 38.193106: standing (46.0)
|   fft_coef_0000 > 38.193106
|   |   fft_coef_0012 <= 1.817792: walking (77.0/1.0)
|   |   fft_coef_0012 > 1.817792
|   |   |   max <= 4.573082: running (4.0/1.0)
|   |   |   max > 4.573082: walking (24.0/2.0)
max > 10.353474: running (93.0)

Number of Leaves  : 5

Size of the tree : 9

The tree is quite simplistic and seemingly accurate as majority class distributions in the terminal nodes are quite high. Let's run a basic classifier evaluation to validate the results:

// Check accuracy of model using 10-fold cross-validation
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(model, data, 10, new Random(1), new String[] {});
System.out.println("Model performance:
"+ eval.toSummaryString());

This outputs the following model performance:

Correctly Classified Instances         226               92.623  %
Incorrectly Classified Instances        18                7.377  %
Kappa statistic                          0.8839
Mean absolute error                      0.0421
Root mean squared error                  0.1897
Relative absolute error                 13.1828 %
Root relative squared error             47.519  %
Coverage of cases (0.95 level)          93.0328 %
Mean rel. region size (0.95 level)      27.8689 %
Total Number of Instances              244     

The classification accuracy scores very high, 92.62%, which is an amazing result. One important reason why the result is so good lies in our evaluation design. What I mean here is the following: sequential instances are very similar to each other, if we split them randomly during a 10-fold cross validation, there is a high chance that we use almost identical instances for both training and testing; hence, straightforward k-fold cross validation produces an optimistic estimate of model performance.

A better approach is to use folds that correspond to different sets of measurements or even different people. For example, we can use the application to collect learning data of five people. Then, it makes sense to run k-person cross validation, where the model is trained on four people and tested on the fifth person. The procedure is repeated for each person and the results are averaged. This will give us a much more realistic estimate of the model performance.

Leaving evaluation comment aside, let's look at how to deal with classifier errors.

Reducing spurious transitions

At the end of the activity recognition pipeline, we want to make sure that the classifications are not too volatile, that is, we don't want activities to change every millisecond. A basic approach is to design a filter that ignores quick changes in the activity sequence.

We build a filter that remembers the last window activities and returns the most frequent one. If there are multiple activities with the same score, it returns the most recent one.

First, we create a new SpuriousActivityRemoval class that will hold a list of activities and the window parameter:

class SpuriousActivityRemoval{
  
  List<Object> last;
  int window;
  
  public SpuriousActivityRemoval(int window){
    this.last = new ArrayList<Object>();
    this.window = window;
  }

Next, we create the Object filter(Object) method that will take an activity and return a filtered activity. The method first checks whether we have enough observations. If not, it simply stores the observation and returns the same value, as shown in the following code:

  public Object filter(Object obj){
    if(last.size() < window){
      last.add(obj);
      return obj;
  }

If we already collected window observations, we simply return the most frequent observation, remove the oldest observation, and insert the new observation:

    Object o = getMostFrequentElement(last);
    last.add(obj);
    last.remove(0);
    return o;
  }

What is missing here is a function that returns the most frequent element from a list of objects. We implement this with a hash map, as follows:

  private Object getMostFrequentElement(List<Object> list){
    
    HashMap<String, Integer> objectCounts = new HashMap<String, Integer>();
    Integer frequntCount = 0;
    Object frequentObject = null;

Now, we iterate over all the elements in the list, insert each unique element into a hash map, or update its counter if it is already in the hash map. At the end of the loop, we store the most frequent element that we found so far, as follows:

    for(Object obj : list){
      String key = obj.toString();
      Integer count = objectCounts.get(key);
      if(count == null){
        count = 0;
      }
      objectCounts.put(key, ++count);
      
      if(count >= frequntCount){
        frequntCount = count;
        frequentObject = obj;
      }
    }
    
    return frequentObject;
  }
  
}

Let's run a simple example:

String[] activities = new String[]{"Walk", "Walk", "Walk", "Run", "Walk", "Run", "Run", "Sit", "Sit", "Sit"};
SpuriousActivityRemoval dlpFilter = new SpuriousActivityRemoval(3);
for(String str : activities){
  System.out.println(str +" -> "+ dlpFilter.filter(str));
}

The example outputs the following activities:

Walk -> Walk
Walk -> Walk
Walk -> Walk
Run -> Walk
Walk -> Walk
Run -> Walk
Run -> Run
Sit -> Run
Sit -> Run
Sit -> Sit

The result is a continuous sequence of activities, that is, we do not have quick changes. This adds some delay, but unless this is absolutely critical for the application, it is acceptable.

Activity recognition may be enhanced by appending n previous activities as recognized by the classifier to the feature vector. The danger of appending previous activities is that the machine learning algorithm may learn that the current activity is always the same as the previous one, as this will often be the case. The problem may be solved by having two classifiers, A and B: the classifier B's attribute vector contains n previous activities as recognized by the classifier A. The classifier A's attribute vector does not contain any previous activities. This way, even if B gives a lot of weight to the previous activities, the previous activities as recognized by A will change as A is not burdened with B's inertia.

All that remains to do is to embed the classifier and filter into our mobile application.

Plugging the classifier into a mobile app

There are two ways to incorporate a classifier into a mobile application. The first one involves exporting a model in the Weka format, using the Weka library as a dependency in our mobile application, loading the model, and so on. The procedure is identical to the example we saw in Chapter 3, Basic Algorithms – Classification, Regression, and Clustering. The second approach is more lightweight; we export the model as a source code, for example, we create a class implementing the decision tree classifier. Then we can simply copy and paste the source code into our mobile app, without event importing any Weka dependencies.

Fortunately, some Weka models can be easily exported to source code by the toSource(String) function:

// Output source code implementing the decision tree
System.out.println("Source code:
" + 
  model.toSource("ActivityRecognitionEngine"));

This outputs an ActivityRecognitionEngine class that corresponds to our model. Now, let's take a closer look at the outputted code:

class ActivityRecognitionEngine {

  public static double classify(Object[] i)
    throws Exception {

    double p = Double.NaN;
    p = ActivityRecognitionEngine.N17a7cec20(i);
    return p;
  }
  static double N17a7cec20(Object []i) {
    double p = Double.NaN;
    if (i[64] == null) {
      p = 1;
    } else if (((Double) i[64]).doubleValue() <= 10.353474) {
    p = ActivityRecognitionEngine.N65b3120a1(i);
    } else if (((Double) i[64]).doubleValue() > 10.353474) {
      p = 2;
    } 
    return p;
  }
...

The outputted ActivityRecognitionEngine class implements the decision tree that we discussed earlier. The machine-generated function names, such as N17a7cec20(Object []), correspond to decision tree nodes. The classifier can be called by the classify(Object[]) method, where we should pass a feature vector obtained by the same procedure as we discussed in the previous sections. As usual, it returns a double, indicating a class label index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.89.30