mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Canopy estimator
Date Thu, 10 May 2012 13:12:20 GMT
No, the issue was discussed but never reached critical mass. I typically 
do a binary search to find the best value setting T1==T2 and then tweak 
T1 up a bit. For feeding k-means, this latter step is not so important.

If you could figure out a way to automate this we would be interested. 
Conceptually, using the RandomSeedGenerator to sample a few vectors and 
comparing them with your chosen DistanceMeasure would give you a hint at 
the T-value to begin the search. A utility to do that would be a useful 
contribution.

On 5/9/12 8:36 PM, Pat Ferrel wrote:
> Some thoughts on https://issues.apache.org/jira/browse/MAHOUT-563
>
> Did anything ever get done with this? Ted mentions limited usefulness. 
> This may be true but the cases he mentions as counter examples are 
> also not very good for using canopy ahead of kmeans, no? That info 
> would be a useful result. To use canopies I find myself running it 
> over and over trying to see some inflection in the number of clusters. 
> Why not automate this? Even if the data shows nothing, that is itself 
> an answer of value and it would save a lot of hand work to find out 
> the same thing.
>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message