mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <lixinchao.em...@gmail.com>
Subject Mahout KMeans generate doubled cluster number than my initial K setting
Date Fri, 12 Oct 2012 09:38:06 GMT
Hi,

 

I am a beginner in Mahout, I use Mahout 0.8 and followed the tutorial in
https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html

 

First, I use :

`mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -i testdata
-o output -t1 20 -t2 50 -k 5 -x 20 -ow`

 

then use clusterdump to extract the cluster-centers: 

 

    mahout clusterdump --input output/clusters-20-final --output
/media/synthetic_control.center

 

after this, in the synthetic_control.center file: 

 

    VL-585{n=50 c=[29.832, 29.589, 29.405, 28.516, 29.600, ..] r=[3.152,
3.518, 3.292, .]}

    

    VL-591{n=197 c=[29.984, 29.681,.] r=[3.602, 3.558, 3.364,.]}

    

    VL-595{n=203 c=[..] r=[..]}

    

    VL-597{n=61 c=[..] r=[..]}

    

    VL-599{n=43 c=[..] r=[..]}

    

    VL-585{n=1 c=[..] r=[..]}

    

    VL-591{n=27 c=[..] r=[..]}

    

    VL-595{n=1 c=[..] r=[..]}

    

    VL-597{n=1 c=[..] r=[..]}

    

    VL-599{n=16 c=[..] r=[..]}

 

 

It seems the kmean generates 10 clusters, but my initial setting for k is 5.

 

I also tried other k, it always generate doubled clusters.

 

Can anyone help me with this? Thanks a lot!

 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message