Blog on Naive Bayes



 INTRODUCTION:

Naive Bayes is a Supervised Machine Learning algorithm based on the Bayes Theorem that is used to solve classification problems by following a probabilistic approach. It is based on the idea that the predictor variables in a Machine Learning model are independent of each other. Meaning that the outcome of a model depends on a set of independent variables that have nothing to do with each other.

The Math Behind Naive Bayes

The principle behind Naive Bayes is the Bayes theorem also known as the Bayes Rule. The Bayes theorem is used to calculate the conditional probability, which is nothing but the probability of an event occurring based on information about the events in the past. Mathematically, the Bayes theorem is represented as:


In the above equation:

  • P(A|B): Conditional probability of event A occurring, given the event B
  • P(A): Probability of event A occurring
  • P(B): Probability of event B occurring
  • P(B|A): Conditional probability of event B occurring, given the event A

Formally, the terminologies of the Bayesian Theorem are as follows:

  • A is known as the proposition and B is the evidence
  • P(A) represents the prior probability of the proposition
  • P(B) represents the prior probability of evidence
  • P(A|B) is called the posterior
  • P(B|A) is the likelihood

Therefore, the Bayes theorem can be summed up as:

Posterior=(Likelihood).(Proposition prior probability)/Evidence prior probability

It can also be considered in the following manner:

Given a Hypothesis H and evidence E, Bayes Theorem states that the relationship between the probability of Hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence P(H|E) is:

Now that you know what the Bayes Theorem is, let’s see how it can be derived.

Derivation Of The Bayes Theorem

The main aim of the Bayes Theorem is to calculate the conditional probability. The Bayes Rule can be derived from the following two equations:

The below equation represents the conditional probability of A, given B:

The below equation represents the conditional probability of B, given A:



Therefore, on combining the above two equations we get the Bayes Theorem:

Bayes Theorem for Naive Bayes Algorithm

The above equation was for a single predictor variable, however, in real-world applications, there are more than one predictor variables and for a classification problem, there is more than one output class. The classes can be represented as, C1, C2,…, Ck and the predictor variables can be represented as a vector, x1,x2,…,xn.

The objective of a Naive Bayes algorithm is to measure the conditional probability of an event with a feature vector x1,x2,…,xn belonging to a particular class Ci,


On computing the above equation, we get:

However, the conditional probability, i.e., P(xj|xj+1,…,xn,Ci) sums down to P(xj|Ci) since each predictor variable is independent in Naive Bayes.

The final equation comes down to:

Here, P(x1,x2,…,xn) is constant for all the classes, therefore we get:


How Does Naive Bayes Work?

To get a better understanding of how Naive Bayes works, let’s look at an example.

Consider a data set with 1500 observations and the following output classes:

  • Cat
  • Parrot
  • Turtle

The Predictor variables are categorical in nature i.e., they store two values, either True or False:

  • Swim
  • Wings
  • Green Color
  • Sharp Teeth 
  •          
  • From the above table, we can summarise that:

    The class of type cats shows that:

    • Out of 500, 450 (90%) cats can swim
    • 0 number of cats have wings
    • 0 number of cats are of Green color
    • All 500 cats have sharp teeth

    The class of type Parrot shows that:

    • 50 (10%) parrots have a true value for swim
    • All 500 parrots have wings
    • Out of 500, 400 (80%) parrots are green in color
    • No parrots have sharp teeth

    The class of type Turtle shows:

    • All 500 turtles can swim
    • 0 number of turtles have wings
    • Out of 500, 100 (20%) turtles are green in color
    • 50 out of 500 (10%) turtles have sharp teeth

    Now, with the available data, let’s classify the following observation into one of the output classes (Cats, Parrot or Turtle) by using the Naive Bayes Classifier.From the above table, we can summarise that:

    The class of type cats shows that:

    • Out of 500, 450 (90%) cats can swim
    • 0 number of cats have wings
    • 0 number of cats are of Green color
    • All 500 cats have sharp teeth

    The class of type Parrot shows that:

    • 50 (10%) parrots have a true value for swim
    • All 500 parrots have wings
    • Out of 500, 400 (80%) parrots are green in color
    • No parrots have sharp teeth

    The class of type Turtle shows:

    • All 500 turtles can swim
    • 0 number of turtles have wings
    • Out of 500, 100 (20%) turtles are green in color
    • 50 out of 500 (10%) turtles have sharp teeth

    Now, with the available data, let’s classify the following observation into one of the output classes (Cats, Parrot or Turtle) by using the Naive Bayes Classifier.From the above table, we can summarise that:

    The class of type cats shows that:

    • Out of 500, 450 (90%) cats can swim
    • 0 number of cats have wings
    • 0 number of cats are of Green color
    • All 500 cats have sharp teeth

    The class of type Parrot shows that:

    • 50 (10%) parrots have a true value for swim
    • All 500 parrots have wings
    • Out of 500, 400 (80%) parrots are green in color
    • No parrots have sharp teeth

    The class of type Turtle shows:

    • All 500 turtles can swim
    • 0 number of turtles have wings
    • Out of 500, 100 (20%) turtles are green in color
    • 50 out of 500 (10%) turtles have sharp teeth

    Now, with the available data, let’s classify the following observation into one of the output classes (Cats, Parrot or Turtle) by using the Naive Bayes Classifier.


    The goal here is to predict whether the animal is a Cat, Parrot or a Turtle based on the defined predictor variables (swim, wings, green, sharp teeth).

    To solve this, we will use the Naive Bayes approach,
    P(H|Multiple Evidences) = P(C1| H)* P(C2|H) ……*P(Cn|H) * P(H) / P(Multiple Evidences)

    In the observation, the variables Swim and Green are true and the outcome can be any one of the animals (Cat, Parrot, Turtle).

    To check if the animal is a cat:
    P(Cat | Swim, Green) = P(Swim|Cat) * P(Green|Cat) * P(Cat) / P(Swim, Green)
    = 0.9 * 0 * 0.333 / P(Swim, Green)
    = 0

    To check if the animal is a Parrot:
    P(Parrot| Swim, Green) = P(Swim|Parrot) * P(Green|Parrot) * P(Parrot) / P(Swim, Green)
    = 0.1 * 0.80 * 0.333 / P(Swim, Green)
    = 0.0264/ P(Swim, Green)

    To check if the animal is a Turtle:
    P(Turtle| Swim, Green) = P(Swim|Turtle) * P(Green|Turtle) * P(Turtle) / P(Swim, Green)
    = 1 * 0.2 * 0.333 / P(Swim, Green)
    = 0.0666/ P(Swim, Green)

    For all the above calculations the denominator is the same i.e, P(Swim, Green). The value of P(Turtle| Swim, Green) is greater than P(Parrot| Swim, Green), therefore we can correctly predict the class of the animal as Turtle.

    Advantages-

    • It is fast, and easy to understand
    • It is not prone to overfitting
    • It only requires a small number of training data to estimate the parameters necessary for classification.
    • It is faster than Random forest, since it can adapt to changing data pretty quickly. If the assumptions of Naive Bayes hold true, then it is much faster than logistic regression as well.
    • It perform well in case of categorical input variables compared to numerical variable(s).
    • Can handle missing values

    Disadvantages-

    • It’s assumptions may not always hold true.
    • If you have huge feature list, the model may not give you accuracy, because the likelihood would be distributed and may not follow the Gaussian or other distribution.

     

      conclusion

  •   Naive Bayes is great for text classifiaction, email filltering,recommendations,medical diagonals and fraud detection. it works well with large text dataset, predicts spam/non-spam emails, suggest personalized items, diagnoses diseases basad on symptoms, and detects fraudulent activites.it assumes feature independence and is computationally efficent. 


Comments

Popular posts from this blog

Blog on K-NN Algorithm Using Python