mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Problems running examples
Date Sun, 05 Jun 2011 19:31:28 GMT
Any idea on how I can generate the sequence files locally?

On 6/5/11 12:23 PM, Mark wrote:
> I was on an older trunk version of 0.5 but then I realized there was 
> an official release the other day so I retried on that with the same 
> results.
>
> Running the same on 0.4 works as expected.
>
> On 6/5/11 11:56 AM, Sean Owen wrote:
>> This all sounds a load like things that were fixed a little while 
>> ago. Are
>> you on version 0.5, or better yet, SVN HEAD?
>>
>> The rest, I don't know, would have to defer to the author of that bit.
>>
>> On Sun, Jun 5, 2011 at 7:07 PM, Mark<static.void.dev@gmail.com>  wrote:
>>
>>> Hi all. I'm trying to run the examples/bin/build-reuters.sh but I 
>>> continue
>>> to run into the following exception.
>>>
>>> INFO: Deleting mahout-work/reuters-kmeans-clusters
>>> Jun 5, 2011 10:29:37 AM org.apache.hadoop.util.NativeCodeLoader<clinit>
>>> WARNING: Unable to load native-hadoop library for your platform... 
>>> using
>>> builtin-java classes where applicable
>>> Jun 5, 2011 10:29:37 AM org.apache.hadoop.io.compress.CodecPool
>>> getCompressor
>>> INFO: Got brand-new compressor
>>> Exception in thread "main" java.lang.IndexOutOfBoundsException: 
>>> Index: 0,
>>> Size: 0
>>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>>     at java.util.ArrayList.get(ArrayList.java:322)
>>>     at
>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:108)

>>>
>>>     at
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101) 
>>>
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>     at
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:58) 
>>>
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

>>>
>>>     at 
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>     at 
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>>>
>>> I am also confused reading the build-reuters.sh code itself. There 
>>> seems to
>>> be some disjunction between what is expected to be local and what 
>>> should be
>>> on HDFS. For example on the comments on 77-79 are:
>>>
>>> # we know reuters-out-seqdir exists on a local disk at
>>> # this point, if we're running in clustered mode,
>>> # copy it up to hdfs
>>>
>>> However upon inspection you'll notice that the reueters-out-seqdir is
>>> actually on HDFS.  It seems like the seqdirectory will never write 
>>> to local
>>> disk... even with the MAHOUT_LOCAL=true flag set.
>>>
>>> Any ideas?
>>>
>>> Thanks
>>>

Mime
View raw message