mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camilo Lopez <cam...@camilolopez.com>
Subject Re: Finding thresholds for canopy
Date Wed, 27 Apr 2011 20:38:02 GMT
Thanks Jeff, I guess the "art" part of it is the initial reasonable number.


On 2011-04-27, at 4:23 PM, Jeff Eastman wrote:

> No good answers here. The T2 value is the one which will control the number of clusters
that Canopy finds. Try an initial value that seems reasonable then do a binary search, halving
or doubling the value etc., until you get a reasonable number of clusters. Increasing T2 will
give you fewer clusters, decreasing will give you more. If your initial value is off a lot
you will get either 1 or numPoints clusters. T1 will affect which points that are near to
a cluster but farther than T2 will contribute to its ultimate centroid. You can make T1=T2
in your binary search then increase T1 incrementally to see how the centroids move. 
> 
> -----Original Message-----
> From: Camilo Lopez [mailto:admin@camilolopez.com] On Behalf Of Camilo Lopez
> Sent: Wednesday, April 27, 2011 12:39 PM
> To: user@mahout.apache.org
> Subject: Finding thresholds for canopy
> 
> I'm using Canopy as first step for K-means clustering, is there any algorithmic, or even
a good heuristic to estimate good T1 and T2 from the vectorized data?


Mime
View raw message