mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Enrique Salazar Torres <gsala...@ime.usp.br>
Subject Re: Mahout Kmeans
Date Thu, 13 Sep 2012 23:41:19 GMT
Hi Paritosh:

I made it work on Hadoop mode, not Local. I don't know if thats desirable.
I also got this error: Hadoop libraries are missing when running local and,
from what I saw at the mahout script, it simply discards all libraries when
MAHOUT_LOCAL is set.
So, is the local mode used for anything? (please forgive my ignorance, I
don't know the whole project)

Gustavo

On Sat, Sep 8, 2012 at 2:35 AM, Paritosh Ranjan <pranjan@xebia.com> wrote:

> Can you open up a jira describing the problem and submitting the patch for
> your fix?
> https://issues.apache.org/**jira/browse/MAHOUT<https://issues.apache.org/jira/browse/MAHOUT>
>
>
> On 08-09-2012 09:40, Gustavo Enrique Salazar Torres wrote:
>
>> Nevermind, got it to work, had to fix the script though.
>>
>> Thanks.
>> Gustavo
>>
>> On Fri, Sep 7, 2012 at 5:54 PM, Gustavo Enrique Salazar Torres <
>> gsalazar@ime.usp.br> wrote:
>>
>>  Hi there:
>>>
>>> I'm trying to finish an improvement to the Kmeans algorithm but I first
>>> need to get it run in order to compare results.
>>> But running the cluster-reuters.sh script I get this error:
>>>
>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>> Running on hadoop, using /home/gustavo/Desktop/yandex_**data/hadoop-
>>> 0.20.203.0/bin/hadoop and
>>> HADOOP_CONF_DIR=/home/gustavo/**Desktop/yandex_data/hadoop-0.20.203.0/**
>>> conf
>>> MAHOUT-JOB:
>>> /home/gustavo/Desktop/yandex_**data/mahout-distribution-0.7/**
>>> mahout-examples-0.7-job.jar
>>> 12/09/07 17:47:43 INFO common.AbstractJob: Command line arguments:
>>> {--clustering=null, --clusters=[./reuters-kmeans-**clusters],
>>> --convergenceDelta=[0.5],
>>> --distanceMeasure=[org.apache.**mahout.common.distance.**
>>> CosineDistanceMeasure],
>>> --endPhase=[2147483647],
>>> --input=[./reuters_out_seqdir_**kmeans/tfidf-vectors], --maxIter=[10],
>>> --method=[mapreduce], --numClusters=[20], --output=[./reuters-kmeans],
>>> --overwrite=null, --startPhase=[0], --tempDir=[temp]}
>>> 12/09/07 17:47:44 INFO common.HadoopUtil: Deleting
>>> reuters-kmeans-clusters
>>> 12/09/07 17:47:44 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>> library
>>> 12/09/07 17:47:44 INFO zlib.ZlibFactory: Successfully loaded &
>>> initialized
>>> native-zlib library
>>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new compressor
>>> 12/09/07 17:47:44 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
>>> reuters-kmeans-clusters/part-**randomSeed
>>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: Input:
>>> reuters_out_seqdir_kmeans/**tfidf-vectors Clusters In:
>>> reuters-kmeans-clusters/part-**randomSeed Out: reuters-kmeans Distance:
>>> org.apache.mahout.common.**distance.CosineDistanceMeasure
>>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.**VectorWritable
>>> Input Vectors: {}
>>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new decompressor
>>> Exception in thread "main" java.lang.**IllegalStateException: No input
>>> clusters found in reuters-kmeans-clusters/part-**randomSeed. Check your
>>> -c
>>> argument.
>>> at
>>> org.apache.mahout.clustering.**kmeans.KMeansDriver.**
>>> buildClusters(KMeansDriver.**java:218)
>>>
>>> As you can see the initial clusters are being created but for a reason I
>>> don't understand why they are being found.
>>> Below is the 'cat' command on the part file containing clusters:
>>>
>>> $ dfs -cat reuters-kmeans-clusters/part-**randomSeed
>>> SEQ
>>> org.apache.hadoop.io.Text5org.**apache.mahout.clustering.**
>>> iterator.ClusterWritable
>>> *org.apache.hadoop.io.**compress.DefaultCodec b�W3 K�E�߇H��Vgustavo
>>>
>>> Can anyone help me please?
>>>
>>> Thanks
>>> Gustavo Salazar
>>>
>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message