mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Mahout Kmeans
Date Fri, 14 Sep 2012 06:30:28 GMT
The general convention is that if there is a MAHOUT_LOCAL env variable, 
this means run 'pseudo-distributed' rather than against a cluster.

On 14-09-2012 05:11, Gustavo Enrique Salazar Torres wrote:
> Hi Paritosh:
>
> I made it work on Hadoop mode, not Local. I don't know if thats desirable.
> I also got this error: Hadoop libraries are missing when running local and,
> from what I saw at the mahout script, it simply discards all libraries when
> MAHOUT_LOCAL is set.
> So, is the local mode used for anything? (please forgive my ignorance, I
> don't know the whole project)
>
> Gustavo
>
> On Sat, Sep 8, 2012 at 2:35 AM, Paritosh Ranjan <pranjan@xebia.com> wrote:
>
>> Can you open up a jira describing the problem and submitting the patch for
>> your fix?
>> https://issues.apache.org/**jira/browse/MAHOUT<https://issues.apache.org/jira/browse/MAHOUT>
>>
>>
>> On 08-09-2012 09:40, Gustavo Enrique Salazar Torres wrote:
>>
>>> Nevermind, got it to work, had to fix the script though.
>>>
>>> Thanks.
>>> Gustavo
>>>
>>> On Fri, Sep 7, 2012 at 5:54 PM, Gustavo Enrique Salazar Torres <
>>> gsalazar@ime.usp.br> wrote:
>>>
>>>   Hi there:
>>>> I'm trying to finish an improvement to the Kmeans algorithm but I first
>>>> need to get it run in order to compare results.
>>>> But running the cluster-reuters.sh script I get this error:
>>>>
>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>>> Running on hadoop, using /home/gustavo/Desktop/yandex_**data/hadoop-
>>>> 0.20.203.0/bin/hadoop and
>>>> HADOOP_CONF_DIR=/home/gustavo/**Desktop/yandex_data/hadoop-0.20.203.0/**
>>>> conf
>>>> MAHOUT-JOB:
>>>> /home/gustavo/Desktop/yandex_**data/mahout-distribution-0.7/**
>>>> mahout-examples-0.7-job.jar
>>>> 12/09/07 17:47:43 INFO common.AbstractJob: Command line arguments:
>>>> {--clustering=null, --clusters=[./reuters-kmeans-**clusters],
>>>> --convergenceDelta=[0.5],
>>>> --distanceMeasure=[org.apache.**mahout.common.distance.**
>>>> CosineDistanceMeasure],
>>>> --endPhase=[2147483647],
>>>> --input=[./reuters_out_seqdir_**kmeans/tfidf-vectors], --maxIter=[10],
>>>> --method=[mapreduce], --numClusters=[20], --output=[./reuters-kmeans],
>>>> --overwrite=null, --startPhase=[0], --tempDir=[temp]}
>>>> 12/09/07 17:47:44 INFO common.HadoopUtil: Deleting
>>>> reuters-kmeans-clusters
>>>> 12/09/07 17:47:44 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>>> library
>>>> 12/09/07 17:47:44 INFO zlib.ZlibFactory: Successfully loaded &
>>>> initialized
>>>> native-zlib library
>>>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new compressor
>>>> 12/09/07 17:47:44 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
>>>> reuters-kmeans-clusters/part-**randomSeed
>>>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: Input:
>>>> reuters_out_seqdir_kmeans/**tfidf-vectors Clusters In:
>>>> reuters-kmeans-clusters/part-**randomSeed Out: reuters-kmeans Distance:
>>>> org.apache.mahout.common.**distance.CosineDistanceMeasure
>>>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.**VectorWritable
>>>> Input Vectors: {}
>>>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new decompressor
>>>> Exception in thread "main" java.lang.**IllegalStateException: No input
>>>> clusters found in reuters-kmeans-clusters/part-**randomSeed. Check your
>>>> -c
>>>> argument.
>>>> at
>>>> org.apache.mahout.clustering.**kmeans.KMeansDriver.**
>>>> buildClusters(KMeansDriver.**java:218)
>>>>
>>>> As you can see the initial clusters are being created but for a reason I
>>>> don't understand why they are being found.
>>>> Below is the 'cat' command on the part file containing clusters:
>>>>
>>>> $ dfs -cat reuters-kmeans-clusters/part-**randomSeed
>>>> SEQ
>>>> org.apache.hadoop.io.Text5org.**apache.mahout.clustering.**
>>>> iterator.ClusterWritable
>>>> *org.apache.hadoop.io.**compress.DefaultCodec b�W3 K�E�߇H��Vgustavo
>>>>
>>>> Can anyone help me please?
>>>>
>>>> Thanks
>>>> Gustavo Salazar
>>>>
>>>>
>>



Mime
View raw message