mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Schlosser" <swschlos...@gmail.com>
Subject Re: Problems with KMeans clustering
Date Mon, 03 Nov 2008 21:50:05 GMT
Hi folks

A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0, and I
found that Mahout Kmeans quit working.  I finally tracked it down to
the fact that the semantics of the combiner changed between 0.16,
0.17, and 0.18 from run exactly once to run zero or more times (which
is in line with how Map/Reduce was originally specified).  See:
https://issues.apache.org/jira/browse/HADOOP-3586.

The Kmeans combiner depended on running exactly once, but on our new
cluster it was running multiple times, causing hard-to-discern errors.
 Basically, the second time through the Combiner, it would throw an
exception that the formatting of the vector (serialized into a Text)
was failing.  In the end, I had to make some formatting changes to the
data output by the Mapper and the Combiner to match what the Reducer
expects, as well as changes to the Combiner input to .  I ended up
having to hack the Mapper to output vectors that either the Combiner
or Reducer could take as input, and make the Combiner take in the same
input that it outputs and to calculate convergence at each step.

My apologies if this has already been covered and put to rest - I just
happened upon this thread this afternoon.

-steve

On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
<philippe.lamarche@gmail.com> wrote:
> Hi there,
> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>
> I intend in the next few day to try to find out what exactly is the problem
> to make sure that it won't come back in a few revisions.
>
> Thanks!
>
> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll <gsingers@apache.org>wrote:
>
>> Hmm, I believe that patch has been applied in 18.2 (whatever that is) but
>> it also looks like it has been applied to 0.17.3 branch as well.    So, it
>> might be something else that "fixed" it.
>>
>> At any rate, glad to hear it works on trunk.
>>
>>
>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>
>>  I am not sure I understand the hadoop svn structure, however I was able to
>>> make it work with hadoop trunk, or 0.20.0-dev.
>>> It didn't work with hadoop/branch-0.18, with or without patch 4277.
>>>
>>>
>>> Here is a copy-paste of the steps, once Hadoop is built and installed.  I
>>> am
>>> using the same exact "apache-mahout-examples-0.1-dev.job", not rebuilt
>>> with
>>> the 0.20.0-dev jars.
>>>
>>> It works!
>>>
>>> That would mean that the bug/feature is not related to
>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>
>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>
>>>
>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting NameNode
>>> STARTUP_MSG:   host = phil/127.0.1.1
>>> STARTUP_MSG:   args = [-format]
>>> STARTUP_MSG:   version = 0.20.0-dev
>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29 18:25:08
>>> EDT 2008
>>> ************************************************************/
>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup
>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved in 0
>>> seconds.
>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been successfully
>>> formatted.
>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>> ************************************************************/
>>>
>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>> /home/philippe/synthetic_control.data testdata
>>>
>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>
>>> /home/philippe/workspace/MahoutJava/examples/build/apache-mahout-examples-0.1-dev.job
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 1
>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>> job_200810291828_0002
>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0002
>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes read=291644
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes written=323660
>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input bytes=288374
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>> job_200810291828_0003
>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0003
>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes read=323660
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes written=9657
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes read=36119
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes written=72300
>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output records=28
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output bytes=943020
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input records=1732
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output records=1732
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input records=28
>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>> job_200810291828_0004
>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0004
>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes read=342974
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes written=3002539
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes read=3018455
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes written=6036972
>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output records=0
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output records=1591
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output bytes=3008903
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input records=0
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output records=1591
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input records=1591
>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>> job_200810291828_0005
>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0005
>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes read=342974
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes written=8205
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes read=23227
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes written=46516
>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output bytes=1136504
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>> job_200810291828_0006
>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0006
>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes read=340070
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes written=8242
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes read=21265
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes written=42592
>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output bytes=1023966
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>> job_200810291828_0007
>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0007
>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes read=340144
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes written=8280
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes read=21085
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes written=42232
>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output bytes=1023681
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>> job_200810291828_0008
>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0008
>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes read=340220
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes written=8250
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes read=21339
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes written=42740
>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output bytes=1028419
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>> job_200810291828_0009
>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0009
>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes read=340160
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes written=8200
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes read=21219
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes written=42500
>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output bytes=1024899
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>> job_200810291828_0010
>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0010
>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes read=340060
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes written=1020535
>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>> job_200810291828_0011
>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0011
>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes read=1020535
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes written=325460
>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input bytes=1020535
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output records=600
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>> philippe.lamarche@gmail.com> wrote:
>>>
>>>  I will!
>>>>
>>>>
>>>> On 10/29/08, Grant Ingersoll <gsingers@apache.org> wrote:
>>>>
>>>>>
>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>> core-user@hadoop.a.o?  See
>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>
>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next week, but
>>>>> if
>>>>> it does fix the issue, then maybe we should move forward to the 18.2
>>>>> candidate (I don't think it has been released yet, those guys have a
>>>>> pretty
>>>>> sophisticated build process going)
>>>>>
>>>>> -Grant
>>>>>
>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>
>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07.
>>>>>
>>>>>>
>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll <gsingers@apache.org
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>> Just a single machine.  I didn't think we were using features either.
>>>>>>
>>>>>>> Are
>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>
>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>
>>>>>>> -Grant
>>>>>>>
>>>>>>>
>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>> Are you guys running on real Hadoop arrays? I can run the synthetic
>>>>>>>> control example just fine on a single machine. That code is just
>>>>>>>> trying
>>>>>>>> to
>>>>>>>> read a vector from a string. I'd be surprised if we were using any
>>>>>>>> "features" but will watch the threads.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>
>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>
>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>
>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2 and not
>>>>>>>>> w/
>>>>>>>>>
>>>>>>>>>  0.18.1.  So, it sounds like a bug in Hadoop, or we are relying on
>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It was
>>>>>>>>>>> working
>>>>>>>>>>>
>>>>>>>>>>>  on
>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> BTW, are you saying the same exact code was working on 0.17.2 or
>>>>>>>>>>>>
>>>>>>>>>>> are
>>>>>>>>>>> you referring to some older Mahout code that worked on 17.2?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  I just updated to hadoop 0.18.1 and got a clean version of
>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be traced
>>>>>>>>>>>>>> down
>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total size: 5011
>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory
>>>>>>>>>>>>>> files
>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge failed
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)
>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input string:
>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is while running the synthetic_control.data example, but I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
>>>>>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce 16%
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>> Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------
>>>>>>>>>>
>>>>>>>>> Grant Ingersoll
>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --------------------------
>>>>>>>>
>>>>>>> Grant Ingersoll
>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>> http://www.lucenebootcamp.com
>>>>>>>
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --------------------------
>>>>> Grant Ingersoll
>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>> http://www.lucenebootcamp.com
>>>>>
>>>>>
>>>>> Lucene Helpful Hints:
>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>> --------------------------
>> Grant Ingersoll
>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>> http://www.lucenebootcamp.com
>>
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message