mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashikant Kore <>
Subject Re: Validating clustering output
Date Wed, 17 Jun 2009 03:43:19 GMT
I had hacked the code to put labels for the vectors. Then I modified
KMeans to output the document label, Cluster ID, and distance from the
cluster. Another utility takes this input and converts labels to the
actual text files from which it is created.   Then I do random checks
manually for the documents in a cluster.

Ugly, but at least I know clustering is "working."

The "top" terms of the cluster may give some idea about the documents
in the cluster.


On Wed, Jun 17, 2009 at 3:05 AM, Grant Ingersoll<> wrote:
> What tools/approaches are people using to validate their clustering output?
>  Are there utilities that we should be implementing that would make this
> easier for users?
> -Grant

View raw message