It is important to understand Bayes' theorem before diving into the classifier. Let A and B denote two events. An event can be that it will rain tomorrow, two kings are drawn from a deck of cards, a person has cancer. In Bayes' theorem, the probability that A occurs given B is true can be computed by:
Where is the probability of observing B given A occurs, and , the probability of A occurs and B occurs respectively. Too abstract? Let's look at some examples:
Example 1: Given two coins, one is unfair with 90% of flips getting a head and 10% getting a tail, another one is fair. Randomly pick one coin and flip it. What is the probability that this coin is the unfair one, if we get a head?
We solve it by first denoting U, the event of picking the unfair coin and H, the event of getting a head. So the probability that the unfair coin is picked given a head is observed can be calculated as follows:
is 90% as what we observed, is 0.5 as we randomly pick a coin out of two. However, deriving the probability of getting a head is not that straightforward, as two events can lead to this - the fair coin is picked F and the unfair one is picked U. So it becomes:
Example 2: Suppose a physician reported the following cancer screening test scenario among 10,000 people:
Cancer |
No Cancer |
Total |
|
Text Positive |
80 |
900 |
980 |
Text Negative |
20 |
9000 |
9020 |
Total |
100 |
9900 |
10000 |
It indicates, for example, 80 out of 100 cancer patients are correctly diagnosed, while the rest 20 are not; cancer is falsely detected in 900 out to 9,900 healthy people. If the result of this screening test on a person is positive, what is the probability that they actually have cancer?
Let's assign the event of having cancer and positive testing result as C and Pos respectively. Apply Bayes' theorem to calculate :
Given a positive screening result, the chance that they have cancer is 8.16%, which is significantly higher than the one under general assumption () without undergoing the screening.
Example 3: Three machines A, B, and C in a factory account for 35%, 20%, and 45% of the bulb production. And the fraction of defective bulbs produced by each machine is 1.5%, 1%, and 2% respectively. A bulb produced by this factory was identified defective (denoted as event D). What are the probabilities that this bulb was manufactured by machine A, B, and C respectively?
Again simply just follow the Bayes' theorem:
Or we do not even need to calculate since we know:
and
so ,
After making sense of Bayes' theorem as the bone of naive Bayes, we can easily move forward with the classifier itself.