In _probabilities method, the probabilities might go over 1 for this case.
Consider there are three messages in our train dataset, of which one is ham and remaining two are spam.
the spam messages contain 'bitcoin' multiple times, let's say the count of word bitcoin in spam messages are 10.
In brief,
ham messages = 1
spam messages = 2
count of bitcoint token = 10
then,
p_token_spam = (spam + self.k) / (self.spam_messages + 2 * self.k) # k -> smoothening factor = 0.5
p_token_spam = (10 + 0.5) / (2 + 2 * 0.5) = 10.5 / 3 = 3.5
Since probabilities cannot go above 1, how should we interpret 3.5 in this case?