mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Lamarche" <>
Subject Re: Problems with the Bayesian classifiers.
Date Sun, 20 Jul 2008 01:16:15 GMT
Now, with the attachment.

On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche
<> wrote:
>  Hi,
> I have been working for a little while with Mahout and the Bayesian
> classifier for a school project.
> I am using the Enron email corpus and the UC Berkeley classified
> emails ( I did a few tests and I can't
> seem to make it work. I wonder if I am doing something wrong.
> For example, I am getting correct prediction under 10%, with Bayes and
> around 1% with CBayes. The problem seems to lie in the fact that all
> instances of a class will be predicted to another class, or that they
> will all be predicted to the class containing the more feature.
> I also tested with the 20News corpus and I get similar result where
> all instances of a class will be predicted to another class. (e.g. all
> 421 "" get predicted as "talk.politics.mideast").
> Attached is two confusions matrix displaying results for bayes and
> cbayes. Both used the same division in the training and testing set.
> Am I doing something wrong?
> Thanks,
> Philippe Lamarche.

View raw message