mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Lamarche" <philippe.lamar...@gmail.com>
Subject Re: Problems with the Bayesian classifiers.
Date Sun, 27 Jul 2008 23:40:38 GMT
 Hi,

I am glad to see that to see you were able to make it working, I will
try it as soon as possible. Probably something went wrong while
downloading/applying/updating Mahout-60.

I am using the UC Berkeley annotated subset from that you can find in
your link, here:
http://bailando.sims.berkeley.edu/enron/enron_with_categories.tar.gz
from here http://bailando.sims.berkeley.edu/enron_email.html.

It's a multiple level label, each message can have a:
Coarse genre,
Included/forwarded information,
Primary topics,
Emotional tone (if not neutral)

There is a .cats file associated with each label.

I made a little utility that let you pick a label type, parse the cats
file and output the message in appropriate labeled folder. Also, it's
easy to just use the 1 to 8 subfolders in the tar, these folders are
labeled by coarse genre. I can share this little app, if you want.

I am very curious to see if I will be able to make it work.

Thanks for the help,
Philippe


On Sun, Jul 27, 2008 at 11:29 AM, Robin Anil <robin.anil@gmail.com> wrote:
> Also could you tell me which version of the enron Email corpus are you using
> for classification. Please provide the link. I found tons of variations
> online. What classification labels are you using (Email User Name?).
> http://sgi.nu/enron/corpora.php
>

Mime
View raw message