mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: mahout quickstart-kmeans script sequencefile parameter
Date Fri, 04 Jun 2010 15:31:51 GMT
Good, and thank-you for posting your findings. I've updated the wiki to 
reflect the revised arguments for k-Means and will update the other 
clustering pages shortly.

Jeff


On 6/3/10 5:15 PM, Tommy Chheng wrote:
>  Yes, it had the help. I was just making a comment in case anyone else 
> ran into the error.
>
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests: 
> http://gradschoolnow.com
>
>
> On 6/3/10 4:55 PM, Jeff Eastman wrote:
>> Yes, the options have changed a bit recently and that script 
>> evidently did not get updated yet. We are working to make all the 
>> algorithm command lines more uniform and still have a ways to go to 
>> accomplish that goal.
>>
>> - w should now be -ow and causes the output directory to be overwritten
>> - x (--maxIter) is also required though perhaps it should not be? Do 
>> you really want kmeans to run forever?
>>
>> If you run the driver with incorrect arguments, does it not print out 
>> the help information for you?
>> Jeff
>>
>>
>> On 6/3/10 2:58 PM, Tommy Chheng wrote:
>>>  Thanks Drew,
>>> I started a new EC2 instance with the mahout trunk and got it 
>>> working. There is a problem with the last line though.
>>>
>>> The last line in the script gave an error:
>>> ../bin/mahout kmeans -i 
>>> ./work/reuters-out-seqdir-sparse/tfidf/vectors/ -c ./work/clusters 
>>> -o ./work/reuters-kmeans -k 20 -w
>>>
>>> org.apache.commons.cli2.OptionException: Unexpected -w while 
>>> processing Options
>>>
>>> Removing the -w and adding the -maxIter fixes it.
>>> ../bin/mahout kmeans -i 
>>> ./work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./work/clusters 
>>> -o ./work/reuters-kmeans -k 20 --maxIter 20
>>>
>>> I added a comment to
>>> https://issues.apache.org/jira/browse/MAHOUT-390
>>>
>>> @tommychheng
>>> Programmer and UC Irvine Graduate Student
>>> Find a great grad school based on research interests: 
>>> http://gradschoolnow.com
>>>
>>>
>>> On 6/2/10 8:27 PM, Drew Farris wrote:
>>>> Very strange:
>>>>
>>>> drew@skirnir:~/mahout/svn-trunk$ svn info
>>>> Path: .
>>>> URL: https://svn.apache.org/repos/asf/mahout/trunk
>>>> Repository Root: https://svn.apache.org/repos/asf
>>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>>> Revision: 950859
>>>> [...]
>>>> drew@skirnir:~/mahout/svn-trunk$ ./bin/mahout seqdirectory -i
>>>> ./work/reuters-out -o ./work/reuters-out-seqdir -c UTF-8
>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>> [..]
>>>> drew@skirnir:~/mahout/svn-trunk$ ls ./work/reuters-out-seqdir
>>>> chunk-0
>>>>
>>>> To be absolutely certain nothing old is lurking in your target 
>>>> directories,
>>>> try 'mvn clean install' to rebuild and see if your results differ. 
>>>> If you
>>>> prefer, you can skip test execution 'mvn clean install 
>>>> -DskipTests=true'
>>>>
>>>> IF that doesn't work, run 'mvn -v' and post the results -- that might
>>>> provide some clues.
>>>>
>>>> - Drew
>>>>
>>>> On Tue, Jun 1, 2010 at 9:39 PM, Tommy 
>>>> Chheng<tommy.chheng@gmail.com>  wrote:
>>>>
>>>>>   I updated the svn and did a mvn install but still getting a parsing
>>>>> command line error on the seqdirectory command.
>>>>> $svn info
>>>>> Path: .
>>>>> URL: http://svn.apache.org/repos/asf/mahout/trunk
>>>>> Repository Root: http://svn.apache.org/repos/asf
>>>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>>>> Revision: 950329
>>>>> Node Kind: directory
>>>>> Schedule: normal
>>>>> Last Changed Author: srowen
>>>>> Last Changed Rev: 950049
>>>>> Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)
>>>>>
>>>>>
>>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>>> ./work/reuters-out-seqdir -c UTF-8
>>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>>> Unexpected -i while processing Options
>>>>>         at 
>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>         at
>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)

>>>>>
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>> Method)
>>>>>         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>>>
>>>>>         at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>>>
>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>         at
>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

>>>>>
>>>>>         at
>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>         at 
>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>>
>>>>> @tommychheng
>>>>> Programmer and UC Irvine Graduate Student
>>>>> Find a great grad school based on research interests:
>>>>> http://gradschoolnow.com
>>>>>
>>>>> On 6/1/10 12:43 PM, Grant Ingersoll wrote:
>>>>>
>>>>>> Can you try doing an SVN update and then "mvn install" and then 
>>>>>> run again?
>>>>>>
>>>>>> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>>>>>>
>>>>>>   Hi,
>>>>>>> I'm using the quickstart-kmeans.sh script from
>>>>>>> https://issues.apache.org/jira/browse/MAHOUT-390 to run the example
>>>>>>> kmeans. I'm on mahout trunk.
>>>>>>>
>>>>>>> It fails on the SequenceFile generation step:
>>>>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>>>>> ./work/reuters-out-seqdir -c UTF-8
>>>>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>>>>> Unexpected -i while processing Options
>>>>>>>         at
>>>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>>>         at
>>>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)

>>>>>>>
>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native

>>>>>>> Method)
>>>>>>>         at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>>>>>
>>>>>>>         at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>>>>>
>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>>>         at
>>>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>>>>
>>>>>>> Alternatively, I tried ./bin/mahout seqdirectory --input
>>>>>>> ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but

>>>>>>> the get the
>>>>>>> same unexpected --input error.
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>>> @tommychheng
>>>>>>> Programmer and UC Irvine Graduate Student
>>>>>>> Find a great grad school based on research interests:
>>>>>>> http://gradschoolnow.com
>>>>>>>
>>>>>>>
>>>
>>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message