mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chih-Hsien Wu <>
Subject Re: Canopy threshold limitation
Date Mon, 25 Nov 2013 15:01:41 GMT
Hey Suneel, thanks for the reply. I'm trying to create hierarchical
clusters via top down approach. I'm caught in the trade off between the
lower canopy threshold and running out of heap memory.  Stream Kmeans
sounds ideal for top clustering. What are the major differences between
Streaming kmeans verses Kmeans, other than faster and less memory usage? In
other words, what are the pros and cons?

On Fri, Nov 22, 2013 at 5:30 PM, Suneel Marthi <>wrote:

> the threshold is based on user's pref of inter-cluster distances. If you
> are running out of memory, suggest increasing the JVM memory settings.
> Not sure as to what you are trying to accomplish, but if you are looking
> to get a first cut at clustering; suggest u look at the new Streaming
> kmeans that's part of Mahout 0.8.
> See
> the
> On Friday, November 22, 2013 4:45 PM, Chih-Hsien Wu <>
> wrote:
> Just out of curiosity. Is there a threshold limitation for canopy
> algorithm? Is it just defined by the user's preference based on the
> inter-cluster distances? or perhaps it is just limited by how much memory
> allowed to execute them?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message