mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <d...@apache.org>
Subject Re: Problems running examples
Date Fri, 10 Jun 2011 21:57:13 GMT
Hmm, I've been able to download the 0.5 src release and run it in
clustered mode. In most cases it completes fine. I ran into problems
once when I had left a mahout-work directory lying around from a
partially completed (aborted) run. I wonder if that could have
something to do with the failures you are seeing too Jeff?

The binary release of 0.5 is most definitely broken, but that breakage
was discussed in another thread and is due to classpath issues in
bin/mahout vs. where things are placed in the binary release.

On Fri, Jun 10, 2011 at 12:34 PM, Jeff Eastman <jeastman@narus.com> wrote:
> I'm still trying to figure out why reuters-0.5 does not work on either of my clusters.
The scripts themselves have no diff and the environment variables are set as in trunk except
for MAHOUT_HOME. The synthetic control and 20 newsgroups examples run on both clusters without
problems (well, 20 newsgroups has a Version Mismatch error on CDH3, but that is another story).
But when I run reuters on 0.5 I see "MAHOUT_LOCAL is set, running locally" followed by file
IO exceptions in MahoutDriver that are cluster dependent. When I run it on trunk, I don't
see this and it works just fine.
>
> -----Original Message-----
> From: Drew Farris [mailto:drew@apache.org]
> Sent: Thursday, June 09, 2011 5:36 PM
> To: user@mahout.apache.org
> Subject: Re: Problems running examples
>
> Jeff, No impuning perceived and thanks for running the variety of
> tests. So it appears that trunk is fine and 0.5 isn't. I'll try to
> determine what (or what didn't) make it into 0.5 that causes it's
> brokenness.
>
> Mark, in the mean time, no need to run all of the tests I've asked
> about previously. Just give trunk a try and see if that resolves your
> problem.
>
> On Thu, Jun 9, 2011 at 7:21 PM, Jeff Eastman <jeastman@narus.com> wrote:
>> Hi Drew,
>>
>> Running trunk locally, latest update, just now, build-reuters.sh works (kmeans and
lda).
>>
>> Running trunk on my CDH3 cluster, just now:
>> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
>> - build-reuters.sh works (with kmeans and lda) Running trunk on my CDH3 cluster:
>>
>> Running trunk on my MapR cluster, just now:
>> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
>> - build-reuters.sh works (with kmeans and lda)
>>
>>
>> Running the 5/31 mahout-distribution-0.5, just now:
>> - build-cluster-syntheticcontrol.sh works (CDH3 & MapR with kmeans and others)
>> - build-reuters.sh runs in local mode only (CDH3 & MapR runs give different errors)
>>
>> I was primarily defending kmeans. It is possible my 5/31 0.5 distribution is not
the final one, since everything seems kosher in trunk now. My apology if I've impuned your
patch.
>>
>> Jeff
>>
>>
>> -----Original Message-----
>> From: Drew Farris [mailto:drew@apache.org]
>> Sent: Thursday, June 09, 2011 11:36 AM
>> To: user@mahout.apache.org
>> Subject: Re: Problems running examples
>>
>> Jeff,
>>
>> Could you tell me about what's failing in KMeans and LDA when running
>> on a cluster? I had this working just prior to 0.5 in
>> https://issues.apache.org/jira/browse/MAHOUT-694
>>
>> Thanks,
>>
>> Drew
>>
>> On Thu, Jun 9, 2011 at 2:01 PM, Jeff Eastman <jeastman@narus.com> wrote:
>>> Ahem, KMeans is not busted. It is being maintained by me, at least. The build-reuters.sh
script runs only in local mode on 0.5 and fails in both KMeans and LDA when run on a cluster.
The MIA examples are not always correct. Most of this has been reported before.
>>>
>>> -----Original Message-----
>>> From: Sean Owen [mailto:srowen@gmail.com]
>>> Sent: Thursday, June 09, 2011 12:29 AM
>>> To: user@mahout.apache.org
>>> Subject: Re: Problems running examples
>>>
>>> (Assuming you are on HEAD,) I think KMeans is busted -- this has come up
>>> before. I don't know if it is being maintained.  Anyone who's willing to
>>> step up and fix it is also welcome to overhaul it IMHO.
>>>
>>> On Thu, Jun 9, 2011 at 12:03 AM, Hector Yee <hector.yee@gmail.com> wrote:
>>>
>>>> I got a slightly different error on the next line of KMeansDriver.java
>>>> (running on OS X Snow Leopard)
>>>>
>>>> 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor
>>>> Exception in thread "main" java.lang.ClassCastException:
>>>> org.apache.hadoop.io.IntWritable cannot be cast to
>>>> org.apache.mahout.math.VectorWritable
>>>>  at
>>>>
>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90)
>>>> at
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102)
>>>>
>>>>
>>>> On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman <jeastman@narus.com> wrote:
>>>>
>>>> > IIRC, Reuters used to run on a cluster but no longer does due to some
>>>> > obscure Lucene changes. In 0.5 it only works in local mode. I really
hope
>>>> > this can be repaired by 0.6 as Reuters is a key entry point into Mahout
>>>> > clustering for many users.
>>>> >
>>>>
>>>
>>
>

Mime
View raw message