A conditional probability formula can also be written as Bayes' theorem:
We call the prior probability. is called the likelihood. These are the things we're interested in, as is essentially a constant anyway.
The theory at this point is a little dry. How does this relate to our project?
For one, we can rewrite the generic Bayes' theorem to one that fits our project:
This formula perfectly encapsulates our project; given a document made up of words, what is the probability that it's Ham or Spam? In the next section, I will show you how to translate this formula into a very powerful classifier, in fewer than 100 lines of code.