Understanding the Naive Bayes classifier

So far, we have only talked about one piece of evidence. However, in most real-world scenarios, we have to predict an outcome (such as a random variable, Y) given multiple pieces of evidence (such as random variables X1 and X2). So, instead of calculating p(Y|X), we would often have to calculate p(Y|X1, X2, ..., Xn). Unfortunately, this makes the math very complicated. For two random variables, X1 and X2, the joint probability would be computed like this:

The ugly part is the term p(X1|X2, C), which says that the conditional probability of X1 depends on all other variables, including C. This gets even complex in the case of n variables: X1, X2, ..., Xn.

Hence, the idea of the Naive Bayes classifier is simply to ignore all of those mixed terms and instead assume that all of the pieces of evidence (Xi) are independent of each other. Since this is rarely the case in the real world, you might call this assumption a bit naive. And that's where the Naive Bayes classifier got its name from.

The assumption that all of the pieces of evidence are independent simplifies the term p(X1|X2, C) to p(X1|C). In the case of two pieces of evidence (X1 and X2), the last equation simplifies to the following equation:

This simplifies our lives a lot since we know how to calculate all of the terms in the equation. More generally, for n pieces of evidence, we get the following equation:

This is the probability that we would predict class C1, given X1, ..., Xn. We could write a second equation that would do the same for another class, C2, given X1, ..., Xn, and for a third and a fourth. This is the secret sauce of the Naive Bayes classifier.

The Naive Bayes classifier then combines this model with a decision rule. The most common rule is to check all of the equations (for all classes C1, C2, ..., Cm) and then pick the class that has the highest probability. This is also known as the Maximum A Posteriori (MAP) decision rule.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.68