mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Harrington <>
Subject kmeans clustering - how to leave some docs unclustered
Date Wed, 20 Feb 2013 17:07:22 GMT
Hi all,

I'm running kmeans to cluster some text docs and some docs that are seemingly unrelated to
the cluster (i.e. noise) are getting clustered and I wish to leave them unclustered.

I thought the clusterClassificationThreshold variable would do this for me

from the java doc

   *          Is a clustering strictness / outlier removal parameter. Its value should be
between 0 and 1. Vectors
   *          having pdf below this value will not be clustered.

but when ever I change this value no clustered points get written and there doesn't seem to
be any change in the clusters, no matter what value I set (tried 0.00001 and 0.99999)

Did I misunderstand what this variable does or am I missing here?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message