mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Lamarche" <philippe.lamar...@gmail.com>
Subject Re: Problems with the Bayesian classifiers.
Date Sun, 20 Jul 2008 01:16:15 GMT
Now, with the attachment.
Sorry.

On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche
<philippe.lamarche@gmail.com> wrote:
>  Hi,
>
> I have been working for a little while with Mahout and the Bayesian
> classifier for a school project.
>
> I am using the Enron email corpus and the UC Berkeley classified
> emails (http://www.cs.cmu.edu/~enron/). I did a few tests and I can't
> seem to make it work. I wonder if I am doing something wrong.
>
> For example, I am getting correct prediction under 10%, with Bayes and
> around 1% with CBayes. The problem seems to lie in the fact that all
> instances of a class will be predicted to another class, or that they
> will all be predicted to the class containing the more feature.
>
> I also tested with the 20News corpus and I get similar result where
> all instances of a class will be predicted to another class. (e.g. all
> 421 "rec.motorcycles" get predicted as "talk.politics.mideast").
> Attached is two confusions matrix displaying results for bayes and
> cbayes. Both used the same division in the training and testing set.
>
> Am I doing something wrong?
>
> Thanks,
>
> Philippe Lamarche.
>

Mime
View raw message