Conditional probability

We'll start with the notion of conditional probability. To set a scene, we'll consider several fruit types:

  • Apple
  • Avocado
  • Banana
  • Pineapple
  • Nectarine
  • Mango
  • Strawberry

For each fruit type, we will have several instances of those fruits—so we could have a green Granny Smith and a red Red Delicious in the class of apples. Likewise, we could have ripe and unripe fruits—mangoes and bananas could be yellow (ripe) or green (unripe), for example. Lastly, we can also classify these fruits by what kind of fruit it is—tropical (avocado, banana, pineapple, and mango) versus non-tropical fruits:

Fruit

Can be green

Can be yellow

Can be red

Is tropical

Apple

yes

no

yes

no

Avocado

yes

no

no

yes

Banana

yes

yes

no

yes

Lychee

yes

no

yes

yes

Mango

yes

yes

no

yes

Nectarine

no

yes

yes

no

Pineapple

yes

yes

no

yes

Strawberry

yes

no

yes

no

 

I would like you to now imagine you're blindfolded and you pick a fruit. I will then describe a feature of the fruit, and you would guess the fruit.

Let's say the fruit you picked has a yellow outside. What are the possible fruits? Nectarines, bananas, pineapples, and mangoes come to mind. If you pick one of the options you would have a one in four chance of being correct. We call this the probability of yellow . The numerator is the number of yeses along the Can be yellow column, and the denominator is the total number of rows.

If I give you another feature about the fruit, you can improve your odds. Let's say I tell you that the fruit is tropical. Now you have a one in three chance of being right—nectarines has been eliminated from the possible choices.

We can ask this question: If we know a fruit is tropical, what is the probability that the fruit is yellow? The answer is 3/5. From the preceding table, we can see that there are five tropical fruits and three of them are yellow. This is called a conditional probability. We write it in a formula such as this (for the more mathematically inclined, this is the Kolmogorov definition of conditional probability):

This is how you read the formula: the probability of A given B is known, and we will need to get the probability of A AND B happening at the same time and the probability of B itself.

The conditional probability of a fruit being yellow, given that it's tropical is three in five; there are actually a lot of tropical fruits that are yellow—tropical conditions allow for greater depositions of carotinoids and vitamin C during the growth of the fruit.

Looking at a tabulated result can yield an easier understanding of conditional probability. However, it must be noted that the conditional probability can be calculated. Specifically, to calculate the conditional probability, this is the formula:

The probability of a fruit being yellow and tropical ( ) is three in eight; there are three such fruits, out of a total of eight. The probability of a fruit being tropical () is five in eight; there are five topical fruits out of the eight listed.

And now, we are finally ready to figure out how we got to that one in three number. The probability of each class of fruits is uniform. If you had to choose randomly, you would get it right one in eight of the time. We can rephrase the question to this: What is the probability of a fruit being a banana given that it's yellow and tropical?

Let's rewrite this as a formula:

It is important that we relied on a special trick to perform the analysis of the preceding probabilities. Specifically, we acted as though each yes represents a singular example existing, while a no indicates that there are no examples, or, in short, this table:

Fruit

Is Green

Is Yellow

Is Red

Is Tropical

Apple

1

0

1

0

Avocado

1

0

0

1

Banana

1

1

0

1

Lychee 

1

0

1

1

Mango

1

1

0

1

Nectarine

0

1

1

0

Pineapple

1

1

0

1

Strawberry

1

0

1

0

 

This will be important for analysis for the spam detection project. The numbers in each would be the number of occurrences within the dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.85