mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Clustering from DB
Date Mon, 27 Jul 2009 18:26:39 GMT

On Jul 27, 2009, at 12:55 PM, Shashikant Kore wrote:

> On Mon, Jul 27, 2009 at 10:11 PM, Grant  
> Ingersoll<gsingers@apache.org> wrote:
>>
>> Not following.  The distance calc stuff is irrespective of the type  
>> of
>> Vector.  I was referring to the centroid length square (I think you  
>> called
>> it the triangle inequality) stuff that Shashikant added on  
>> MAHOUT-121.  We
>> use it for testing convergence, but not for other distance  
>> calculations.  I
>> haven't looked to see if it is applicable yet, but it seems like it  
>> should
>> be.
>>
>
> Grant,
>
> Yes, that part of the patch is missing.  In my original patch, I had
> modified the  emitPointToNearestCluster() in kmeans/Cluster.java to
> calculate distance between document and centroids of various clusters.
> (There is no triangle inequality code, though.)  In the later patches
> I don't see that code.
>
> I had reviewed the final patch, but I missed out on this one.  I
> think, I only ran Canopy and not K-means. Incidentally, I am
> hopelessly out of date with trunk as recently I have not worked on
> this.  BTW, I haven't really followed this thread in depth. So, I
> might be speaking out of context here. Apologies.

I'll be on a plane tomorrow, will see if I can track down the  
differences.

-Grant

Mime
View raw message