lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Luis Betancourt Gonzalez <jlbetanco...@uci.cu>
Subject Re: News clustering
Date Mon, 03 Dec 2012 20:03:35 GMT
I'm trying to using to search though news websites, but I was interested in classification
on index time, is there any available solution for this?

Greetings!

On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski <stanislaw@osinski.name> wrote:

>> I mean measuring the similarity between the document in each cluster.
>> Also, difference between document on one cluster with another cluster.
>> 
>> I saw the sample code ClusteringQualityBencmark.java
>> However, I do not know how to make use of it for assessing my Solr
>> Clustering performance.
>> 
> 
> You'd need to write your own code for this, here are the most common
> clustering quality measures you mentioned:
> 
> http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
> 
> These are meant for the general case (numeric attributes), to apply them to
> texts, you'd need to use the vector representation of the documents.
> 
> One a more general note, synthetic measures test only the document-cluster
> assignments, but none take the quality of labels into account (this is
> really hard to measure objectively).
> 
> Staszek
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Mime
View raw message