mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: syntheticcontroldata clustering example failure due to combiner
Date Thu, 11 Jun 2009 17:32:51 GMT
Good to hear. The current implementation is actually the first one I 
did, so it was easy to revert to that model. It does require the mapper 
to retain all of the canopies; however, and this could create an OOME if 
the T values are poorly chosen. Doing the centroid calculation in the 
combiner removed this difficulty but the Hadoop semantics change makes 
it a non-starter. If there was some globally-unique way to create new 
cluster identifiers as they are needed, the centroid calculation could 
be moved to the reducer. There would still be a need to combine the 
clusters created by each of the mappers...


Adil Aijaz wrote:
> Jeff,
> Thanks for the quick turnaround on this issue. Just tested it and the 
> canopy creation and kmeans both work now on syntheticcontroldata. I 
> get 7 canopies and 7 clusters. Collection logic in close() is not 
> pretty but can't think of a workaround myself.
> adil
> Jeff Eastman wrote:
>> r783617 removed the CanopyCombiner and refactored its semantics back 
>> into the reducer. Updated unit tests pass and Synthetic Control with 
>> Canopy produces 6 clusters. Kmeans also runs produces 6 clusters too. 
>> I really don't like doing stuff in close() but see no practical 
>> alternative. Ideas are still welcomed.
>> Jeff
>> Jeff Eastman wrote:
>>> Adil Aijaz wrote:
>>>> 2. There is a bug in 
>>>> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/

>>>> that called runJob from main function with my provided arguments 
>>>> transposed. So, my convergenceDelta was interpreted as t1, t1 as 
>>>> t2, and t2 as convergenceDelta. I will commit a patch as soon as I 
>>>> get approval for opensource commits from my employer, however, I 
>>>> thought I'd put it out there in case someone else is going through 
>>>> the same issue.
>>> r783585 fixed the parameter ordering bug. Still working on the 
>>> Combiner problem.
>>> Thanks Adil,
>>> Jeff

View raw message