Understanding different voting schemes

Two different voting schemes are common among voting classifiers:

  • In hard voting (also known as majority voting), every individual classifier votes for a class, and the majority wins. In statistical terms, the predicted target label of the ensemble is the mode of the distribution of individually predicted labels.
  • In soft voting, every individual classifier provides a probability value that a specific data point belongs to a particular target class. The predictions are weighted by the classifier's importance and summed up. Then, the target label with the greatest sum of weighted probabilities wins the vote.

For example, let's assume we have three different classifiers in the ensemble that perform a binary classification task. Under the hard voting scheme, every classifier would predict a target label for a particular data point:

Classifier Predicted target label
Classifier #1 Class 1
Classifier #2 Class 1
Classifier #3 Class 0

 

The voting classifier would then tally up the votes and go with the majority. In this case, the ensemble classifier would predict class 1 (one).

In a soft voting scheme, the math is slightly more involved. In soft voting, every classifier is assigned a weight coefficient, which stands for the importance of the classifier in the voting procedure. For the sake of simplicity, let's assume all three classifiers have the same weight: w1 = w2= w3 = 1.

The three classifiers would then go on to predict the probability of a particular data point belonging to each of the available class labels:

Classifier Class 0 Class 1
Classifier #1 0.3 w1 0.7 w1
Classifier #2 0.5 w2 0.5 w2
Classifier #3 0.4 w3 0.6 w3

 

In this example, classifier #1 is 70% sure we're looking at an example of class 1. Classifier #2 is 50-50, and classifier #3 tends to agree with classifier #1. Every probability score gets combined with the weight coefficient of the classifier.

The voting classifier would then compute the weighted average for each class label:

  • For class 0, we get a weighted average of (0.3 w1 + 0.5 w2 + 0.4 w3) / 3 = 0.4.
  • For class 1, we get a weighted average of (0.7 w1 + 0.5 w2 + 0.6 w3) / 3 = 0.6.

Because the weighted average for class 1 is higher than for class 0, the ensemble classifier would go on to predict class 1 (one).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.125