mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: MAHOUT-236 Cluster Evaluation Tools?
Date Thu, 08 Apr 2010 22:43:33 GMT
Looking at the paper it doesn't seem to require MR for the final CDbw 
calculation, right? For each cluster we only need to compare one of its 
points with one point in each other cluster. With small numbers of 
representative points per cluster that can be done easily in memory. I'd 
love to see the code you have for computing representative points.

Jeff


Robin Anil wrote:
> On Wed, Apr 7, 2010 at 11:50 PM, Jeff Eastman <jdog@windwardsolutions.com>wrote:
>
>   
>> Hi Robin,
>>
>> Interesting paper. I'm beginning to see how to MR the representative point
>> selection already. The rest will hopefully become clearer with more study.
>> Lots of MR jobs are needed to:
>>     
>
>
>
>   
>> a) get the data into Vectors, We have something for text, missing for other
>> formats
>>     
>
>
>
>   
>> b) iterate (e.g. kmeans) over the data to produce a set of clusters, Done
>>     
>
>
>
>   
>> c) cluster the data, Done
>>     
>
>
>
>   
>> d) iterate over the clustered data to derive representative points for each
>> cluster, and finally Done ;)
>>     
>
>
>
>   
>> e) produce the CDbw.- TODO
>>     
>
>
>
>
>   
>> And, of course all of this is again iterated with different values for the
>> clustering algorithm's parameters. Should keep the lights on at PG&E
>> producing power for the server farms.
>>
>>
>>
>> Robin Anil wrote:
>>
>>     
>>> Hi Jeff,
>>>            This is an good paper with a simple measure of cluster quality
>>> measurement based on intra cluster density and inter cluster separation.
>>> Its
>>> pretty easy to compute. Need to make it a map/reduce job
>>>
>>> http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw
>>> Robin
>>>
>>>
>>>
>>>
>>>       
>>     
>
>   


Mime
View raw message