mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error
Date Fri, 08 Oct 2010 15:22:48 GMT
  +user@

Thanks so much Pragnesh, for putting your finger so succinctly on the 
problem. I'm cross-posting this to user@ so that it will be part of that 
searchable archive too. I will also append to MAHOUT-504.

I'm glad to hear you are out of the woods on this,
Jeff


On 10/8/10 2:02 AM, pragnesh radadia wrote:
> finally I am able to run kmean example of Clustering of synthetic control data.
>
> I think problem is "hadoop is running as hadoop user(using cloudera
> cdh3) and I am trying to run example as pragnesh user"
>
> so hadoop is not able find the under "/user/hadoop"
>
> since example is using relative path to store the input and clustering data.
>
> -pragnesh
>
>
> On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
> <jdog@windwardsolutions.com>  wrote:
>>   Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout checkout
>> and build, followed by uploading the synthetic_control.data file to a local
>> Hadoop instance. The k-means job ran without incident. On a hunch, I also
>> uploaded the file as testdata (not in directory testdata) and that worked
>> too. I'm baffled why I can't duplicate this and suspect it is a local system
>> issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits int
>
> o different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>>> ]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>>> t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>        was (Author: pgradadia):
>>>      i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                  Key: MAHOUT-504
>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>              Project: Mahout
>>>>           Issue Type: Bug
>>>>             Reporter: Zhen Guo
>>>>             Assignee: Robin Anil
>>>>              Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>         at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message