Chapter 4

Naïve Bayesian Classification

4.1 Introduction

Naïve Bayesian classifiers [1] are simple probabilistic classifiers with their foundation on application of Bayes’ theorem with the assumption of strong (naïve) independence among the features. The following equation [2] states Bayes’ theorem in mathematical terms:

P(A|B)=P(A)P(B|A)P(B)

where:

A and B are events

P(A) and P(B) are the prior probabilities of A and B without regard to each other

P(A|B), also called posterior probability, is the probability of observing event A given that B is true

P(B|A), also called likelihood, is the probability of observing event B given that A is true

Suppose that vector X = (x1, x2, … xn) is an instance (with n independent features) to be classified and cj denotes one of K classes, then using Bayes’ theorem we can calculate the posterior probability, P(cj|X), from P(cj), P(X), and P(X|cj). The naïve Bayesian classifier makes a simplistic (naïve) assumption called class conditional independence that the effect of the value of a predictor (xi) on a given class cj is independent of the values of other predictors.

Without going into mathematical details, for each of the K classes, the calculation of P(cj|X) for j = 1 to K is performed. The instance X will be assigned to class ck, if and only if

P(ck|X) > P(cj|X)  for  1 ≤ jK, jk

The idea will be further clear when the example classification using the naïve Bayesian classifier will be discussed along with the implementation of MATLAB®.

4.2 Example

To demonstrate the concept of the naïve Bayesian classifier, we will again use the following dataset:

4.3 Prior Probability

Our task is to predict using different features whether tennis will be played or not. Since there are almost twice as many examples of “Play=yes”(9 examples) as compared to examples of “Play=No”(5 examples), it is reasonable to believe that a new unobserved case is almost twice as likely to have class of “Yes” as compared to “No.” In the Bayesian paradigm, this belief, based on previous experience, is known as the prior probability.

Since there are 14 available examples, 9 of which are Yes and 5 are No, our prior probabilities for class membership are as follows:

Prior Probability P(Play = Yes) = 9 / 14
Prior Probability P(Play = No) = 5 / 14

4.4 Likelihood

Let X be the new example for which we want to predict that tennis is going to be played or not. We can assume that the more (Play = Yes) (or No) examples are closer to X, the more likely that the new cases belong to (Play = Yes) (or No).

Let X = (Outlook = Overcast, Temperature = Mild, Humidity = Normal, Windy = False), then we have to compute the conditional probabilities that are given as underlined text in the following table:

Using the above probabilities, we can obtain the two probabilities of the likelihood of X belonging to any of the two classes:

  1. P(X/Play = Yes)

  2. P(X/Play = No)

The two probabilities can be obtained by the following calculations:

P(X/Play = Yes) = P(Outlook = overcast| play = Yes) × P(Temperature = mild| play = Yes) × P(Humidity = normal| play = Yes) × P(Windy = false| play = Yes)

P(X/Play = No) = P(Outlook = overcast| play = No) × P(Temperature = mild| play = No) × P(Humidity = normal| play = No) × P(Windy = false| play = No)

4.5 Laplace Estimator

One of the evident problems in calculating P(X/Play = No) is the presence of the value of zero for the conditional probability P(Outlook = overcast/Play = No). This will make the whole probability equivalent to zero. In order to handle this problem, we will use the Laplace estimator.

The new Prior Probabilities will be as follows:

Prior Probability P(Play = Yes) = (9 + 1) / (14 + 2)
= 10/16
Prior Probability P(Play = No) = (5 + 1) / (14 + 2)
= 6/16

The following table describes the conditional probabilities after the Laplace correction:

The two probabilities of likelihood can be calculated easily by the following:

P(X/Play = Yes) = P(Outlook = Overcast| Play = Yes) × P(Temperature = Mild| Play = Yes) × P(Humidity = Normal| Play = Yes) × P(Windy = False| Play = Yes)

P(X/Play = Yes) = 5/12 × 5/12 × 7/11 × 7/11 = 0.070305

P(X/Play = No) = P(Outlook = Overcast| Play = No) × P(Temperature = Mild| Play = No) × P(Humidity = Normal| Play = No) × P(Windy= False| Play = No)

P(X/Play = No) = 1/8 × 3/8 × 2/7 × 3/7 = 0.00574

4.6 Posterior Probability

In order to calculate the posterior probability, we need three things.

  1. Prior probability

  2. Likelihood

  3. Evidence

The following formula shows the relationship among the three variables to calculate posterior probability:

Posterior=Prior×LikelihoodEvidence

For the classification purpose, we are interested in calculating and comparing the numerator of the above fraction because the evidence in the denominator is same for both classes. In other words, the posterior is proportional to the likelihood times the prior.

Posterior ∝ Prior × Likelihood

The numerator prior × likelihood for two classes can be calculated by simply multiplying the respective prior probabilities and the probabilities of likelihood.

P(Play = Yes/X) ∝

P(Play = Yes) × P(X/Play = Yes)

 

10/16 × 0.070305 = 0.043941

P(Play = No/X) ∝

P(Play = No) × P(X/Play = No)

 

6/16 × 0.00574 = 0.002152

Since the value of P(Play = Yes) × P(X/Play = Yes) > P(Play = No) × P(X/Play = No), we will assign class “Yes” to the new case “X.”

4.7 MATLAB Implementation

In MATLAB, one can perform calculations related to the naïve Bayesian classifier easily.

We will first load the same dataset that we have discussed in the chapter as an example in the MATLAB environment and we will then calculate different parameters related to the naïve Bayesian classifier.

The following code snippet loads the data from “data.csv” into the MATLAB environment.

fid = fopen(‘C:Naive Bayesiandata.csv’)’;
out = textscan(fid,’%s%s%s%s%s’,’delimiter’,’,’);
fclose(fid);
num_featureswithclass = size(out,2);
tot_rec = size(out{size(out,2)},1)-1;
for i = 1:tot_rec
     yy{i} = out{num_featureswithclass}{i+1};
end
for i = 1: num_featureswithclass
     xx{i} = out{i};
end

For calculating the prior probabilities of the class variable, the following code snippet will perform the job.

In order to calculate the likelihood table, the following code snippet works:

prob_table=zeros(num_featureswithclass-1,10,nc);
for col = 1:num_featureswithclass-1
unique_value = unique(xx{col});
rec_unique_value{col} = unique_value;
    for i = 2:length(unique_value)
        for j = 2:tot_rec+1
        if strcmp(xx{col}{j}, unique_value{i}) == 1 &&
strcmp(xx{num_featureswithclass}{j}, yu{1}) ==1
        prob_table(col, i-1,1) = prob_table(col,
i-1,1) + 1;
        end
        if strcmp(xx{col}{j}, unique_value{i}) == 1 &&
strcmp(xx{num_featureswithclass}{j}, yu{2}) ==1
        prob_table(col, i-1,2) = prob_table(col,
i-1,2) + 1;
        end
        end
    end
end
prob_table(:,:,1) = prob_table(:,:,1)./
num_of_rec_for_each_class(1);
prob_table(:,:,2) = prob_table(:,:,2)./
num_of_rec_for_each_class(2);

The matrix “prob_table” used in the above code is a matrix of 4 × 10 × 2 dimension where “4” is the number of attributes in the dataset. The number “10” is the possible number of unique value in any attribute. In this example, the maximum number was “3.” The number “2” refers to the number of classes. If we see the values present in the prob_table, the understanding will be further enhanced.

Predicting for an unlabeled record:

Now that we have a naïve Bayesian classifier in the form of tables, we can use them to predict newly arriving unlabeled records. The following code snippet describes the prediction process in MATLAB.

A = {‘sunny’, ‘hot’,’high’,’false’};
A1 = find(ismember(rec_unique_value{1}, A{1}));
  A11 = 1;
A2 = find(ismember(rec_unique_value{2}, A{2}));
  A21 = 2;
A3 = find(ismember(rec_unique_value{3}, A{3}));
  A31 = 3;
A4 = find(ismember(rec_unique_value{4}, A{4}));
  A41 = 4;
ProbN = prob_table(A11, A1 − 1,1)*prob_
  table(A21, A2 − 1,1) *prob_table(A31, A3 − 1,1)
  *prob_table(A41, A4 − 1,1)*fy(1);
ProbP = prob_table(A11, A1 − 1,2)*prob_
table(A21, A2 − 1,2) *prob_table(A31, A3 − 1,2)
*prob_table(A41, A4 − 1,2) *fy(2);
if ProbN > ProbP
prediction = ‘N’
else
prediction = ‘P’
end

References

1. Good, I. J. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. Cambridge: MIT Press, 1965.

2. Kendall, M. G. and Stuart, A. The Advanced Theory of Statistics. London: Griffin, 1968.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.18.101