lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iwan Hanjoyo <ihanj...@gmail.com>
Subject Re: News clustering
Date Tue, 04 Dec 2012 01:18:10 GMT
Hi Stanislaw,

I see. Thank you for the reference.

Kind regards,

Hanjoyo

On Tue, Dec 4, 2012 at 12:37 AM, Stanislaw Osinski
<stanislaw@osinski.name>wrote:

> > I mean measuring the similarity between the document in each cluster.
> > Also, difference between document on one cluster with another cluster.
> >
> > I saw the sample code ClusteringQualityBencmark.java
> > However, I do not know how to make use of it for assessing my Solr
> > Clustering performance.
> >
>
> You'd need to write your own code for this, here are the most common
> clustering quality measures you mentioned:
>
>
> http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
>
> These are meant for the general case (numeric attributes), to apply them to
> texts, you'd need to use the vector representation of the documents.
>
> One a more general note, synthetic measures test only the document-cluster
> assignments, but none take the quality of labels into account (this is
> really hard to measure objectively).
>
> Staszek
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message