lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislaw Osinski <>
Subject Re: News clustering
Date Mon, 03 Dec 2012 17:37:08 GMT
> I mean measuring the similarity between the document in each cluster.
> Also, difference between document on one cluster with another cluster.
> I saw the sample code
> However, I do not know how to make use of it for assessing my Solr
> Clustering performance.

You'd need to write your own code for this, here are the most common
clustering quality measures you mentioned:

These are meant for the general case (numeric attributes), to apply them to
texts, you'd need to use the vector representation of the documents.

One a more general note, synthetic measures test only the document-cluster
assignments, but none take the quality of labels into account (this is
really hard to measure objectively).


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message