Naive Bayes classifier

Let's try to understand how classification models work with the help of a Naive Bayes classifier. In order to understand Naive Bayes classifiers, we need to understand the Bayes theorem. The Bayes theorem is the theorem we studied in probability, and can be explained with the help of an example.

Let's say that we have two machines, both of which produce spanners. The spanners are marked with which machine has produced them. M1 is the label for machine 1 and M2 is the label for machine 2.

Let's say that one spanner is defective and we want to find the probability that the defective spanner was produced by machine 2. The probability of event A happening provided B has already occurred is determined by the Naive Bayes theorem. We therefore make use of the Bayes theorem as follows:

  • P(A) represents the probability of an event happening.
  • p(B/A) represents the probability of B given A (the probability of B happening assuming that A has already happened).
  • P(B) represents the probability of B happening.
  • p(A/B) represents the probability of A given B (the probability of A happening, assuming that B has already happened).
  • If we put the data in terms of probability, we get the following:

 

Let's say we have a dataset of people, of whom some walk to work and some drive to work, depending upon the age category they fall into:

                   

If a new data point is added, we should be able to say whether that person drives to work or walks to work. This is supervised learning; we are training the machine on a dataset and deriving a learned model from that. We will apply Bayes theorem to determine the probability of the new data point belonging to the walking category and the driving category.

To calculate the probability of the new data point belonging to the walking category, we calculate P(Walk/X). Here, X represents the features of the given person, including their age and their salary:

To calculate the probability of the new data point belonging to the driving category, we calculate P(Drives/X) as shown in the following: 

Finally, we will compare P(Walks/X) and P(Drives/X). Based on this comparison, we will establish where to put the new data point (in the category in which the probability is higher). The initial plotting happens over n-dimensional space, depending upon the values of independent variables.

Next, we compute the marginal likelihood, as shown in the following figure, which is P(X):

P(X) actually refers to the probability of adding the new data point to a place that has data points with similar features. The algorithm divides or makes a circle around the data points that it finds are similar in features to the one it is about to add. Then, the probability of the features is computed as P(X) =number of similar observations/Total observations

  • The radius of the circle is an important parameter in this case. This radius is given as an input parameter to the algorithm:

  • In this example, all the points inside the circle are assumed to have similar features to the data point that is to be added. Let's say that the data point that we are adding relates to someone who is 35 years old and has a salary of $40,000. In this case, everybody within the bracket $25-40K would be selected in the circle:

  • Next, we need to compute the likelihood, which means the probability that someone chosen randomly who walks contains the features of X. The following will determine P(X/walks):

  • We will be doing the same to derive the probability of the data point belonging to the driving section given that it has features identical to people who walk
    • In this case, P(X) is equal to the number of similar observations that fall in the circle shown before, divided by the total number of observations . P(X) =4/30 = 0.133
    • P(drives)= P(# who drive) /(#total) =20/30 = 0.666
    • P(X|Drivers) = P (similar observations that are drivers) /total drivers = 1/20 =0.05
    • Applying the values we get P(Drivers|X) =0.05 *0.666 /0.133 =0.25 =>25

For the given problem, we will assume that the data point will belong to the set of walkers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.67.251