lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: TermVector (TF-IDF Scores) From Subset of Documents
Date Thu, 29 Oct 2009 13:04:38 GMT
Have a look at the TermVectorComponent: http://wiki.apache.org/solr/TermVectorComponent 
.  That might help.

On Oct 28, 2009, at 10:30 PM, peelman wrote:

>
> I have an index of about 3 million documents, and specific list of  
> document
> ids that belong in that 3 million (somewhere around 20-50 documents on
> average).  With my filtered list of documents I want to be able to get
> TF-IDF scores calculated based on only that small subset, instead of  
> the
> scores from the entire 3 million document index.
>
> Is there an easy way to do this using a filtered/subquery, or via  
> any other
> means?
>
> Presently I am testing by creating a new index out of the subset of
> documents to get the TF-IDF scores, but obviously that is not going  
> to work
> or scale in a finished implementation.
>
> Thanks in advance.
> -- 
> View this message in context: http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-From-Subset-of-Documents-tp26105328p26105328.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message