# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From Gurudev Devanla <betaco...@gmail.com>
Subject Re: Question on Bayes Classifier
Date Fri, 30 Apr 2010 17:09:06 GMT
```Thank you all for the responses. I was able to access the link provided by
Robin.  I will have to go through the document a little slowly to understand
how the probabilities help/not help. Will do that soon.

As for my pet project, I was trying to implement an EM algorithm using Naive
Bayes. Hence, I think probability of classes would not be equal( or cancel
out), since I need to deal with a large amount of unlabelled data. While
assigning the probabilities to these documents, the Pr(C) would also change.
May be the document addresses this as well. Will keep the group posted of
what I learn.

Thanks
Guru

On Thu, Apr 29, 2010 at 12:15 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Wed, Apr 28, 2010 at 11:25 PM, Gurudev Devanla <betacoder@gmail.com
> >wrote:
>
> >
> > This is my first post ever on any open source mailing list. So, please
> > excuse me if I am not following certain standards.
> >
>
> You are doing great.
>
>
> > I was walking through the code for Naive Bayes classifier and I notice
> that
> > in TestClassifier.java, at the point where the document wieghts are
> > calculated the probability of the class(label)  is not taken into
> > consideration. My knowledge of document wt in Naive Bayes is :
> >
> > Pr(C|D )  =  Pr(D|C) * P(C) , but in the implementation I have
> > I
> > don't see Pr(C) being used in the calculation.
> >
>
> Actually, the real computation is
>
>   pr(C and D) = pr(D | C) * pr(C)
>   pr(C | D) = pr(C and D) / pr(D) = pr(D | C) * pr(C) / pr(D)
>
> With D fixed to a single document under consideration, we don't need to
> consider pr(D) because
>
>  argmax_C pr(C | D) = argmax_C pr(C, D)
>
> You are correct, however, that pr(C) might well be considered.  It is
> conventional assumed, however, that the probabilities of all classes are
> equal so that this term can be ignored.  If you have information about the
> a
> priori prevalence of different categories, it would not be amiss to include
> this factor.
>
> This consideration is considered in equation (3) in the paper "Tackling the
> Poor Assumptions of Naive Bayes Text Classiﬁers" by Jason Rennie and others
> that Robin mentions where log pr(C) is written as b_c.  Just after this,
> however, the authors say:
>
> "the class probabilities tend to be overpowered by the combination of word
> probabilities, so we use a uniform prior estimate for simplicity"
>
>
> This is equivalent to saying that pr(C) = 1 / m where m is the number of
> categories.
>
> If you have trouble getting the PDF that Robin mentioned (CiteseerX is like
> a yo-yo lately) you can get the slides for a talk by Jason on the same
> topic: http://people.csail.mit.edu/jrennie/talks/icml03.pdf
>

```
Mime
• Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message