mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miles Osborne" <mi...@inf.ed.ac.uk>
Subject Re: Problems with the Bayesian classifiers.
Date Sun, 20 Jul 2008 10:21:36 GMT
i think it would also be useful to cross-check your results against a text
classification system which is known to work.  look at rainbow:

http://www.cs.cmu.edu/~mccallum/bow/rainbow/

if you get the correct results here then either you have somehow messed-up
with Mahout or else there really is a bug

Miles

2008/7/20 Robin Anil <robin.anil@gmail.com>:

> Can you upload your split somewhere.
>
> On Sun, Jul 20, 2008 at 6:46 AM, Philippe Lamarche <
> philippe.lamarche@gmail.com> wrote:
>
> > Now, with the attachment.
> > Sorry.
> >
> > On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche
> > <philippe.lamarche@gmail.com> wrote:
> > >  Hi,
> > >
> > > I have been working for a little while with Mahout and the Bayesian
> > > classifier for a school project.
> > >
> > > I am using the Enron email corpus and the UC Berkeley classified
> > > emails (http://www.cs.cmu.edu/~enron/<http://www.cs.cmu.edu/%7Eenron/><
> http://www.cs.cmu.edu/%7Eenron/>).
> > I did a few tests and I can't
> > > seem to make it work. I wonder if I am doing something wrong.
> > >
> > > For example, I am getting correct prediction under 10%, with Bayes and
> > > around 1% with CBayes. The problem seems to lie in the fact that all
> > > instances of a class will be predicted to another class, or that they
> > > will all be predicted to the class containing the more feature.
> > >
> > > I also tested with the 20News corpus and I get similar result where
> > > all instances of a class will be predicted to another class. (e.g. all
> > > 421 "rec.motorcycles" get predicted as "talk.politics.mideast").
> > > Attached is two confusions matrix displaying results for bayes and
> > > cbayes. Both used the same division in the training and testing set.
> > >
> > > Am I doing something wrong?
> > >
> > > Thanks,
> > >
> > > Philippe Lamarche.
> > >
> >
>
>
> Thanks
> Robin
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message