mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chih-Hsien Wu <>
Subject Re: Only one reducer running on canopy generator
Date Tue, 26 Nov 2013 14:20:24 GMT
I got another question. The error "Java Heap Error" is kind of broad. I
don't know where I run out of the memory exactly. In other words, I'm
allowed to configure the daemon's heap sizes on Amazon Web services but
which heap size should I adjust, e.g. datanode, tasktracker, namenode?

On Tue, Nov 26, 2013 at 8:59 AM, Chih-Hsien Wu <> wrote:

> Hey Suneel, I did hit the OOM during the generation phase. I increase the
> JVM by tuning up "" to the max (like 8g) but to no
> avail. I also notice that there are ton of free memory not be utilized!?.
> This might correspond to what you say that generation only take one
> reducer. So my question is, would increasing heap size of worknode or
> namenode help in this case?
> On Mon, Nov 25, 2013 at 6:59 PM, Suneel Marthi <>wrote:
>> Canopy Clustering is a 2 step process: Canopy Generation followed by
>> Canopy Clustering.
>> For Canopy Generation, it uses a single reducer (and this cannot be
>> overidden), while the Clustering task uses multiple reducers.
>> You seem to be hitting OOM during the Canopy generation phase.
>> On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <>
>> wrote:
>> Hi all,  I have been experiencing memory issue while working with Mahout
>> canopy algorithm on big set of data on Hadoop. I notice that only one
>> reducer was running while other nodes were idle. I was wondering if
>> increasing the number of reduce tasks would ease down the memory usage and
>> speed up procedure. However, I realize that by configuring
>> "mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
>> still running only with one reducer. Now, I'm question if canopy is set
>> that way, or am I not configuring correct on Hadoop?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message