mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: MAHOUT-236 Cluster Evaluation Tools?
Date Wed, 07 Apr 2010 18:20:33 GMT
Hi Robin,

Interesting paper. I'm beginning to see how to MR the representative 
point selection already. The rest will hopefully become clearer with 
more study. Lots of MR jobs are needed to: a) get the data into Vectors, 
b) iterate (e.g. kmeans) over the data to produce a set of clusters, c) 
cluster the data, d) iterate over the clustered data to derive 
representative points for each cluster, and finally e) produce the CDbw. 
And, of course all of this is again iterated with different values for 
the clustering algorithm's parameters. Should keep the lights on at PG&E 
producing power for the server farms.


Robin Anil wrote:
> Hi Jeff,
>             This is an good paper with a simple measure of cluster quality
> measurement based on intra cluster density and inter cluster separation. Its
> pretty easy to compute. Need to make it a map/reduce job
> http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw
> Robin
>
>
>   


Mime
View raw message