Understanding Naïve Bayes

Naïve Bayes uses conditional probabilities in order to classify the observations. In this section, you will learn how it works. We will invent a simple dataset, and a disease, for this purpose. Let's have a look at the table. The table shows health behaviors of 11 individuals and whether or not 10 of them have developed DiseaseZ (the name of our made up disease) one year after these behaviors have been assessed. What we want to know is whether the individual is at risk of developing the disease. We will solve this using existing data about the individual and associations previously found in other individuals:

Smoking

Drinking

PhysicalActivity

Movies

Music

Sunbathing

DiseaseZ

YES

YES

NO

NO

NO

YES

YES

YES

NO

YES

NO

YES

YES

NO

NO

YES

NO

NO

YES

NO

YES

NO

NO

YES

NO

NO

YES

YES

YES

YES

NO

NO

NO

NO

YES

NO

NO

YES

YES

NO

NO

NO

NO

YES

YES

YES

NO

NO

NO

YES

YES

YES

YES

YES

YES

YES

NO

NO

NO

NO

NO

NO

NO

YES

YES

YES

YES

YES

YES

YES

NO

YES

NO

YES

NO

YES

 

What we can tell from the table is that the probability of developing DiseaseZ is 6/10 = 0.6. We will call it the prior probability—if we don't know anything about an individual, we can tell he/she has a 60 percent chance of developing DiseaseZ. The posterior probability is what we want to know—that is, the probability that an individual will develop the disease knowing their health behavior. This requires computing the conditional probabilities for each of the health behaviors—that is, what is the probability that a behavior is performed by someone who has developed the disease, and someone who hasn't.

The reader can load the dataset as follows (make sure the file diseaseZ.txt is in your working directory):

1  DiseaseZ = read.table("DiseaseZ.txt", header = T, sep="	")

We first create two datasets, one with individuals with DiseaseZ and the other with individuals without DiseaseZ, and compute the number of cases in each:

2  Sick = subset(DiseaseZ, DiseaseZ=="YES")
3  NotSick = subset(DiseaseZ, DiseaseZ=="NO")
4  dim(Sick)[1]
5  dim(NotSick)[1]

The output indicates that there are six individuals in the Sick data frame, and four in the NotSick data frame, which is what we computed before. We can now obtain the conditional probabilities using the following code:

6  prob.Sick = colSums(Sick[,1:6]== "YES")/6
7  prob.NotSick  = colSums(NotSick[,1:6]== "NO")/4

The probabilities of having performed the behaviors for individuals who were and were not sick (rounded to the second decimal) are displayed in the following table. As a reminder, behaviors performed by the individual to classify are indicated by an X in the last row of the table:

 

Smoking

Drinking

PhysicalActivity

Movies

Music

Sunbathing

DiseaseZ == 1

0.67

0.83

0.5

0.33

0.5

0.67

DiseaseZ == 0

0.25

0.25

0.75

0.50

0.25

0.25

  

X

 

X

 

X

As can be noticed from the conditional probabilities alone, there are differences in the performance of the behaviors between people who have and have not developed the disease. Proportionally, more people who have been smoking, drinking, listening to music, and have been sunbathing have developed the disease, compared to those who haven't. Performing physical activities, going to see movies and listening to music are related to not having the disease in this fictitious example.

But is there an association between the behaviors? I mean, do people who smoke also drink? This is a possibility, but Naïve Bayes assumes that all the attributes that are used to predict the classes are independent of each other. This is clearly not the case in the real world. But, it turns out, Naïve Bayes classifies the observations quite reliably even though it is based on this unrealistic assumption. Naïve Bayes uses the conditional joint probabilities to determine the class of the observations for which the class is unknown. So let's try computing those ourselves. In order to do this, we also need the probabilities of not having performed the behaviors given the class. They are equal to 1 minus the probability of performing the behaviors.

The probabilities of not having performed the behaviors are displayed here:

 

Smoking

Drinking

PhysicalActivity

Movies

Music

Sunbathing

DiseaseZ == 1

0.33

0.17

0.5

0.67

0.5

0.33

DiseaseZ == 0

0.75

0.75

0.25

0.5

0.75

0.75

We now can compute the probability that our unclassified individual has or has not developed DiseaseZ by computing the joint probability of each behavior given both outcomes, multiplied by the prior probability. Let's recall that the individual has not been smoking, has been drinking, didn't take physical activities, has been to the movies, has not listened to music, and has been sunbathing.

For the outcome DiseaseZ == 1, the joint probability is:

(.33 * .83 * .5 * .33 * .5 * .67) * .6 = 0.009083894

For the outcome DiseaseZ == 0, the joint probability is:

(.75 * .25 * .25 * .5 * .75 * .25) * .4 =  0.001757813

As this value is higher for DiseaseZ == 1 as compared to DiseaseZ ==0, we will conclude that the individual to classify is at risk of developing the disease.

Now let's have a look at what an R implementation of Naïve Bayes finds out! We start by installing and loading the e1071 package that contains the naiveBayes() function:

1  install.packages("e1071")
2  library(e1071)

We then train a classifier (we will use observations 1 to 10 for training), and inspect its content:

3  Classify = naiveBayes(DiseaseZ[1:10,1:6], 
4     DiseaseZ[1:10,7])         
5  Classify

Let's examine the values as shown in the following output. We can see that the prior probabilities are the same as we computed before: 0.4 for not having DiseaseZ, and 0.6 for having DiseaseZ, so we computed this right. Good! Now let's examine the conditional probabilities. We are not going to comment on all. We'll just have a look at those under smoking. We reported a conditional probability of smoking of 0.67 among individuals who developed DiseaseZ. The classifier reports the same thing (but rounds after seven decimals, whereas we did this after two). We found a conditional probability of not smoking of 0.33 (that is 1 minus 0.67, as probabilities must sum to 1) among those individuals. That's what the classifier found as well. For individuals who didn't develop the disease, we computed a probability of 0.25 of being a smoker, and a probability of 0.75 of not being a smoker. I will let you examine the output to see what values are reported by the classifier:

Understanding Naïve Bayes

The classifier for our first example

We determined that the individual to classify was at risk of developing DiseaseZ, based on her or his behaviors and the associated probabilities. Let's now see what Naïve Bayes estimates:

predict(Classify, DiseaseZ[11,1:6])

As indicated in the output, the predict() function, given the naiveBayes() classifier and the behavior of the individual to classify as arguments, gives the same answer to this classification problem as we did:

[1] YES
Levels:  NO YES

It is worth mentioning that Naïve Bayes is not limited to categorical predictors, and works in a similar way with continuous ones, using density estimations instead of conditional probabilities.

Now that we have demonstrated how Naïve Bayes works, we are going to examine its use in detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.178.9