Using a stacking ensemble with Shark-ML

To show the implementation of more ensemble learning techniques, we can develop the stacking approach manually. This is not hard with the Shark-ML library, or indeed any other library.

First, we need to define weak (or elementary) algorithms that we are going to use for stacking. To unify access to the weak algorithms, we defined the base class, as follows:

struct WeakModel {
virtual ~WeakModel() {}
  virtual void Train(const ClassificationDataset& data_set) = 0;
  virtual LinearClassifier<RealVector>& GetClassifier() = 0;
};

We used it for creating three weak algorithms' models—the logistic regression, the linear discriminant analysis (LDA), and the linear SVM models, as illustrated in the following code block:

struct LogisticRegressionModel : public WeakModel {
    LinearClassifier<RealVector> classifier;
    LogisticRegression<RealVector> trainer;
    void Train(const ClassificationDataset& data_set) override {
        trainer.train(classifier, data_set);
    }
    LinearClassifier<RealVector>& GetClassifier() override { return 
    classifier; }
};

struct LDAModel : public WeakModel {
    LinearClassifier<RealVector> classifier;
    LDA trainer;
    void Train(const ClassificationDataset& data_set) override {
        trainer.train(classifier, data_set);
    }
    LinearClassifier<RealVector>& GetClassifier() override { return 
    classifier; }
};

struct LinearSVMModel : public WeakModel {
    LinearClassifier<RealVector> classifier;
    LinearCSvmTrainer<RealVector> trainer{SVM_C, false};
    void Train(const ClassificationDataset& data_set) override {
        trainer.train(classifier, data_set);
    }
    LinearClassifier<RealVector>& GetClassifier() override { return 
    classifier; }
};

These classes hide the usage of different types of trainer classes, but expose the standard interface for the LinearClassifier<RealVector> type through the GetClassifier() method. Furthermore, they implement the general Train method, which takes the object of the ClassificationDataset class.

One of the crucial moments for the stacking approach is combining (stacking) results of weak algorithms to one set, which is used for training or evaluating the meta-algorithm. There is the MakeMetaSet method in our implementation, which does this type of job. It takes the vector of predictions from weak algorithms, and the corresponding labels from the original dataset, and combines them into a new object of the ClassificationDataset class, as illustrated in the following code block:

 ClassificationDataset MakeMetaSet(
                        const std::vector<Data<unsigned int>>& inputs,
                        const Data<unsigned int>& labels) {
     auto num_elements = labels.numberOfElements();
     std::vector<RealVector> vinputs(num_elements);
     std::vector<unsigned int> vlabels(num_elements);
     std::vector<RealVector::value_type> vals(inputs.size());
     for (size_t i = 0; i < num_elements; ++i) {
         for (size_t j = 0; j < inputs.size(); ++j) {
             vals[j] = inputs[j].element(i);
         }
         vinputs[i] = RealVector(vals.begin(), vals.end());
         vlabels[i] = labels.element(i);
     }
     return createLabeledDataFromRange(vinputs, vlabels);
 }

This method creates two vectors of inputs and labels and uses the Shark-ML function createLabeledDataFromRange to create a new dataset object. Notice that new inputs are vectors of objects of the RealVector type, and these objects have a new dimension equal to 3 because there are three algorithms we used to predict meta-features. Take a look at the vals object used to combine the outputs (here, it is the inputs variable) from them. These algorithms also take the RealVector objects as input. In the original dataset, they have 30 features; however, in our implementation, they have only five after the PCA dimensionality reduction.

Because of the nature of the selected algorithms, we need to normalize our data. Let's assume we have two datasets for training and testing, as follows:

 void StackingEnsemble(const ClassificationDataset& train,
                       const ClassificationDataset& test) {
     ...
 }

To normalize the training dataset, we need to copy the original dataset because the Normalizer algorithm works in place and modifies the objects with which it works, as illustrated in the following code block:

 ClassificationDataset train_data_set = train;
 train_data_set.makeIndependent();

When we have a copy of the dataset, we can normalize it with the instance of the classifier object trained with the NormalizeComponentsUnitVariance class object. As with all algorithms in the Shark-ML library, we have to train the normalizer first, and only then can we apply it to the transformInputs function. This function transforms only input features because we don't need to normalize binary labels, and can be seen in the following code block:

 bool removeMean = true;
 Normalizer<RealVector> normalizer;
 NormalizeComponentsUnitVariance<RealVector> 
     normalizing_trainer(removeMean);
 normalizing_trainer.train(normalizer, train_data_set.inputs());
 train_data_set = transformInputs(train_data_set, normalizer);

To speed up and generalize the models we used, we also reduced the dimensionality of the training features with the PCA algorithm. Note that the PCA class doesn't use the train method, but rather has the encoder method for obtaining the object of the LinearModel class, which is then used for dimensionality reduction, as illustrated in the following code block:

 PCA pca(train_data_set.inputs());
 LinearModel<> pca_encoder;
 pca.encoder(pca_encoder, 5);
 train_data_set = transformInputs(train_data_set, pca_encoder);

Now, after preprocessing our training dataset, we can define and train the weak models that we are going to use for evaluation, as follows:

 // weak models
 std::vector<std::shared_ptr<WeakModel>> weak_models;
 weak_models.push_back(std::make_shared<LogisticRegressionModel>());
 weak_models.push_back(std::make_shared<LDAModel>());
 weak_models.push_back(std::make_shared<LinearSVMModel>());
 
 // train weak models for predictions
 for (auto weak_model : weak_models) {
     weak_model->Train(train_data_set);
 }

For training, the meta-algorithm needs to get the meta-features, and, according to the stacking approach, we will split our training dataset into several folds—10, in our case. We then will train several weak models separately on each of the folds. The validation aspect of the fold will be used for weak model evaluation, and its results will be added to a meta-training dataset and used for training the meta-algorithm.

There is the createCVSameSizeBalanced function in the Shark-ML library, which can be used for fold creation. It creates equal-size folds, where each consists of two parts: the training part and the validation part. We will iterate over created folds to train weak models and create meta-features. Note in the following code block that we will create new models on each iteration of the loop:

     size_t num_partitions = 10;
     ClassificationDataset meta_data_train;
     auto folds = createCVSameSizeBalanced(train_data_set, num_partitions);
     for (std::size_t i = 0; i != folds.size(); ++i) {
         // access the fold
         ClassificationDataset training = folds.training(i);
         ClassificationDataset validation = folds.validation(i);
         
         // train local weak models - new ones on each of folds
         std::vector<std::shared_ptr<WeakModel>> local_weak_models;
         local_weak_models.push_back(
             std::make_shared<LogisticRegressionModel>());
         local_weak_models.push_back(std::make_shared<LDAModel>());
         local_weak_models.push_back(std::make_shared<LinearSVMModel>());
         
         std::vector<Data<unsigned int>> meta_predictions;
         for (auto weak_model : local_weak_models) {
             weak_model->Train(training);
             auto predictions = 
                 weak_model->GetClassifier()(validation.inputs());
             meta_predictions.push_back(predictions);
         }
         
         // combine meta features
         meta_data_train.append(MakeMetaSet(meta_predictions, 
             validation.labels()));
     }

The meta_data_train object contains the meta-features and is used to train the meta-model, which is the regular linear SVM model in our case, as follows:

     LinearClassifier<RealVector> meta_model;
     LinearCSvmTrainer<RealVector> trainer(SVM_C, true);
     trainer.train(meta_model, meta_data_train);

Having trained the ensemble, we can try it on the test dataset. Since we used data preprocessing, we should also transform our test data in the same way that we transformed our training data. This can be easily done with the normalizer and the pca_encoder objects, which are already trained and hold the required transformation options inside. Usually, such objects (as well as the model) should be stored on secondary storage. The code can be seen in the following snippet:

     ClassificationDataset test_data_set = test;
     test_data_set.makeIndependent();
     test_data_set = transformInputs(test_data_set, normalizer);
     test_data_set = transformInputs(test_data_set, pca_encoder);

The ensemble evaluation starts by predicting meta-features, using the weak models we trained before. We will make the meta_test dataset object in the same way as we made the training meta-dataset. We will store predictions from every weak model in the meta_predictions vector and will use our helper function to combine them in the object of the ClassificationDataset type, as follows:

     std::vector<Data<unsigned int>> meta_predictions;
     for (auto weak_model : weak_models) {
         auto predictions =
             weak_model->GetClassifier()(test_data_set.inputs());
         meta_predictions.push_back(predictions);
     }
     ClassificationDataset meta_test =
     MakeMetaSet(meta_predictions, test_data_set.labels());

After we have created the meta-features, we can pass them as input to the meta_model object to generate the real predictions. We can also calculate the accuracy, like this:

     Data<unsigned int> predictions = meta_model(meta_test.inputs());
     
     ZeroOneLoss<unsigned int> loss;
     double accuracy = 1. - loss.eval(meta_test.labels(), predictions);
     std::cout << "Stacking ensemble accuracy = " << accuracy << std::endl;
 }

The output of this code is Stacking ensemble accuracy = 0.985507. You can see that this ensemble performs better than the random forest implementation, even with default settings. In the case of some additional tuning, it could give even better results.

Table of Contents for Using a stacking ensemble&#xA0;with Shark-ML

Create new playlist

Sign In

Sign Up

Table of Contents for
Using a stacking ensemble with Shark-ML