mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil" <robin.a...@gmail.com>
Subject Re: Problems with the Bayesian classifiers.
Date Sun, 20 Jul 2008 09:08:55 GMT
Can you upload your split somewhere.

On Sun, Jul 20, 2008 at 6:46 AM, Philippe Lamarche <
philippe.lamarche@gmail.com> wrote:

> Now, with the attachment.
> Sorry.
>
> On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche
> <philippe.lamarche@gmail.com> wrote:
> >  Hi,
> >
> > I have been working for a little while with Mahout and the Bayesian
> > classifier for a school project.
> >
> > I am using the Enron email corpus and the UC Berkeley classified
> > emails (http://www.cs.cmu.edu/~enron/ <http://www.cs.cmu.edu/%7Eenron/>).
> I did a few tests and I can't
> > seem to make it work. I wonder if I am doing something wrong.
> >
> > For example, I am getting correct prediction under 10%, with Bayes and
> > around 1% with CBayes. The problem seems to lie in the fact that all
> > instances of a class will be predicted to another class, or that they
> > will all be predicted to the class containing the more feature.
> >
> > I also tested with the 20News corpus and I get similar result where
> > all instances of a class will be predicted to another class. (e.g. all
> > 421 "rec.motorcycles" get predicted as "talk.politics.mideast").
> > Attached is two confusions matrix displaying results for bayes and
> > cbayes. Both used the same division in the training and testing set.
> >
> > Am I doing something wrong?
> >
> > Thanks,
> >
> > Philippe Lamarche.
> >
>


Thanks
Robin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message