[Bug] Incorrect implementation of conditional probability in Naive Bayes classifier

In `_probabilities` method, the probabilities might go over 1 for this case.

Consider there are three messages in our train dataset, of which one is ham and remaining two are spam.
the spam messages contain 'bitcoin' multiple times, let's say the count of word `bitcoin` in spam messages are 10.
In brief,

ham messages = 1
spam messages = 2
count of `bitcoint` token = 10

then,
``` 
p_token_spam = (spam + self.k) / (self.spam_messages + 2 * self.k) # k -> smoothening factor = 0.5
p_token_spam = (10 + 0.5) / (2 + 2 * 0.5) = 10.5 / 3 = 3.5
```

Since probabilities cannot go above 1, how should we interpret 3.5 in this case?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Incorrect implementation of conditional probability in Naive Bayes classifier #129

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Incorrect implementation of conditional probability in Naive Bayes classifier #129

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions