mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chih-Hsien Wu <chjaso...@gmail.com>
Subject Re: Only one reducer running on canopy generator
Date Tue, 26 Nov 2013 14:20:24 GMT
I got another question. The error "Java Heap Error" is kind of broad. I
don't know where I run out of the memory exactly. In other words, I'm
allowed to configure the daemon's heap sizes on Amazon Web services but
which heap size should I adjust, e.g. datanode, tasktracker, namenode?


On Tue, Nov 26, 2013 at 8:59 AM, Chih-Hsien Wu <chjasonwu@gmail.com> wrote:

> Hey Suneel, I did hit the OOM during the generation phase. I increase the
> JVM by tuning up "mapred.child.java.opts" to the max (like 8g) but to no
> avail. I also notice that there are ton of free memory not be utilized!?.
> This might correspond to what you say that generation only take one
> reducer. So my question is, would increasing heap size of worknode or
> namenode help in this case?
>
>
> On Mon, Nov 25, 2013 at 6:59 PM, Suneel Marthi <suneel_marthi@yahoo.com>wrote:
>
>> Canopy Clustering is a 2 step process: Canopy Generation followed by
>> Canopy Clustering.
>>
>> For Canopy Generation, it uses a single reducer (and this cannot be
>> overidden), while the Clustering task uses multiple reducers.
>>
>> You seem to be hitting OOM during the Canopy generation phase.
>>
>>
>>
>>
>>
>> On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <chjasonwu@gmail.com>
>> wrote:
>>
>> Hi all,  I have been experiencing memory issue while working with Mahout
>> canopy algorithm on big set of data on Hadoop. I notice that only one
>> reducer was running while other nodes were idle. I was wondering if
>> increasing the number of reduce tasks would ease down the memory usage and
>> speed up procedure. However, I realize that by configuring
>> "mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
>> still running only with one reducer. Now, I'm question if canopy is set
>> that way, or am I not configuring correct on Hadoop?
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message