mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <d...@apache.org>
Subject Re: Can't Get Bayes Classifier to Work Properly
Date Tue, 05 Oct 2010 20:22:03 GMT
Ryan,

Sorry to hear it's still not working for you. I can try to reproduce
your problem to see if I've missed anything important. Are you using a
release version of mahout or are you running from trunk?

How many examples in each of your training sets?

Drew

On Tue, Oct 5, 2010 at 2:02 PM, Ryan Rosario <uclamathguy@gmail.com> wrote:
> Thank you for your help.
>
> I tried dividing the data into two files spam.txt and nonspam.txt
> within directory "simple_spam",
> but still have the same problem. No useful output.
>
> Ryan
>
> On Mon, Oct 4, 2010 at 7:42 PM, Drew Farris <drew@apache.org> wrote:
>> Hi Ryan,
>>
>> Your format looks good. The -i argument must point to a directory of
>> one or more files as input. In the example the 20newsgroups data is
>> separated into a single file per class. I'm not certain this is a
>> requirement because the class is in the first column after all.
>>
>> If you are running from trunk, you might find that './bin/mahout
>> trainclassifier' and './bin/mahout testclassifier' is easier to
>> remember than the somewhat arcane maven invocation.
>>
>> HTH,
>>
>> Drew
>>
>> On Mon, Oct 4, 2010 at 10:21 PM, Ryan Rosario <uclamathguy@gmail.com> wrote:
>>> Hi,
>>>
>>> I have a data file that I formatted in the same manner as the
>>> 20newsgroups example I have seen. A snippet of my fake data file
>>> (key\tword1 word2 word3... \n)
>>>
>>> spam    you need some viagra medication my friend
>>> nonspam hi ryan my name is cassie and I am in your class
>>> spam    aviator sunglasses with your name on them
>>> nonspam hello ryan can you do me a favor
>>> spam    free infertility medication here
>>>
>>> I am trying to train and test the CBayes classifier. When I test the
>>> classifier, I get the following non-sense output:
>>>
>>> INFO: =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :          0          
  �%
>>> Incorrectly Classified Instances        :          0          
  �%
>>> Total Classified Instances              :          0
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       <--Classified as
>>> 0       0        |  0           a     = spam
>>> 0       0        |  0           b     = nonspam
>>> Default Category: unknown: 2
>>>
>>>
>>> [INFO] ------------------------------------------------------------------------
>>> [INFO] BUILD SUCCESSFUL
>>> [INFO] ------------------------------------------------------------------------
>>> [INFO] Total time: 1 second
>>> [INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010
>>> [INFO] Final Memory: 26M/360M
>>> [INFO] ------------------------------------------------------------------------
>>>
>>> I am using the following commands from the wiki to run the jobs:
>>>
>>> mvn -e exec:java \
>>>      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \
>>>      -Dexec.args="-i simple_spam \
>>>                   -o spam-model \
>>>                   -type cbayes \
>>>                   -ng 1 \
>>>                   -source hdfs"
>>>
>>> mvn -e exec:java \
>>>      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \
>>>      -Dexec.args="-m spam-model \
>>>                   -d simple_spam \
>>>                   -type cbayes \
>>>                   -ng 1 \
>>>                   -source hdfs \
>>>                   -method sequential"
>>>
>>> What might I be doing wrong? Let me know if you need more information.
>>>
>>> Thanks,
>>> Ryan
>>>
>>> --
>>> RRR
>>>
>>
>
>
>
> --
> RRR
>

Mime
View raw message