mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: A Mahout Naive Bayes classifier problem
Date Fri, 04 May 2012 13:32:38 GMT
Can you provide the console output when you run train or test
On May 4, 2012 8:09 AM, "Zehao Jin" <zehaojin@gmail.com> wrote:

> **
> Dear all,
> I'm a mahout beginner, I need to use the mahout Naive Bayes classifier for
> text classification.To get started, I followed the example of Twenty
> NewsGroup:
> 1.Start the Hadoop clusters.
> 2.Run the 20 newsgroup example by executing the script:
> $./examples/bin/build-20news-bayes.sh ,and chose Naive Bayes method.
> 3.Finally I got the result same Confusion Matrix as you put here:
> https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups
> But I have to classifier the Chinese texts, I had no clue, so I read the
> shell script:examples/bin/build-20news-bayes.sh and I knew how this example
> processed.Then I did like the script:
> 1.Preparing Training Data.
> The script use
> org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups to format the
> E-mail texts and gets one document per line,the label and the words,you
> know,the Chinese is different from English,the words cannot splitted by a
> space,different combination have different meaning, so I used a Chinese
> text analyzer to split the words, and match the format. Each line is like
> this: Label+'\t'+word1 word2 ....+'\n';
> The example's analyzer output :
>  And the Chinese anlyzer output:
>
> 2.Put the formatted train data and the test data to HDFS.(My Hadoop
> platform has 1 namenode and 4 datanodes on Fedora 14)The example have 20
> categories, and my corpus has 10 categoris:
> The example:              My categories:
>
> 3.Train the classifier and test the classifier on Hadoop.
> The example do like this:
>
> ./bin/mahout trainclassifier -i /20news-bydate/bayes-train-input -o /20news-bydate/bayes-model
-type bayes -ng 1 -source hdfs
>
>   ./bin/mahout testclassifier -m /20news-bydate/bayes-model -d /20news-bydate/bayes-test-input
-type bayes -ng 1 -source hdfs -method mapreduce
> And my commands are absolutely accord the example,the only difference is
> the directory.
>
> Strangely I cannot get the result as the example,I ran the program several
> times, but the mapreduce job always fail!
> Task xxx failed to report status for 600 seconds.Killing.
>
> What I want to ask that are the mahout trainclassifer (
> ./bin/mahout trainclassifier xxx)and testclassifier(  ./bin/mahout testclassifier
> xxx) codes fit for my program ? Or it can only be used by the 20
> newsgroup example? if they cannot be used ,it's really hard for me to
> achieve the Naive Bayes algorithm...Or is it the charset problems ? Many
> problems are occurred by this. Can you give me some support? I scratched my
> head for a few days. Thank you very much!!!
> ------------------------------
>  Zehao Jin,SCUT , China.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message