mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <>
Subject Re: Only one reducer running on canopy generator
Date Mon, 25 Nov 2013 23:59:01 GMT
Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy Clustering.

For Canopy Generation, it uses a single reducer (and this cannot be overidden), while the
Clustering task uses multiple reducers.

You seem to be hitting OOM during the Canopy generation phase.

On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <> wrote:
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
"mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message