mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Difference in KMeans performance with Mahout-0.3 and Mahout-0.4
Date Mon, 17 Jan 2011 16:55:45 GMT
4000 clusters is a lot as well.

Did the 0.3 solution have lots of clusters with single members?

On Mon, Jan 17, 2011 at 8:46 AM, Jeff Eastman <jdog@windwardsolutions.com>wrote:

> I can't think of any architectural changes which would cause the
> convergence performance to change but this is a curious indeed. I see you
> are using DenseVectors but you did not say what their cardinality is. You
> also did not say how you generated the initial clusters (canopy or random
> sample). Can you run the 0.4 k-means with the initial clusters from your 0.3
> run? That would tend to isolate the change to either k-means itself or the
> the sampling algorithm in RandomSeedGenerator. A poor set of initial
> clusters could greatly impact the convergence so that is where I'd suggest
> starting.
>
> Jeff
>
> On 1/17/11 9:04 AM, Lokendra Singh wrote:
>
>> Hi all,
>>
>> I am running KMeans clustering algorithm to cluster about 60K points
>> (DenseVectors) into 4K clusters on my Hadoop Cluster.
>> I initialized the clusters with initial 'k' points  as centroids(4000) and
>> kept the convergence threshold pretty low (0.001).
>>
>> I tried running it with Mahout-0.3 and 0.4 version and found huge
>> difference
>> in their performance.
>> The rate of convergence was pretty high with mahout-0.3 ( in 1st iteration
>> about 600 clusters (out of 4000) converged, by 6th iteration almost 3500
>> clusters (out of 4000) converged).
>> While with mahout-0.4, I observed just 10 clusters (out of 4000)
>> converging
>> even after 10 iterations.
>>
>> What architectural difference between implementation of KMeans of
>> mahout-0.4
>> and mahout-0.3 might be causing this difference in performance?
>>
>> Regards
>> Lokendra
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message