lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: TermVector (TF-IDF Scores) From Subset of Documents
Date Thu, 29 Oct 2009 20:50:39 GMT

On Oct 29, 2009, at 11:10 AM, peelman wrote:

>
> Indeed I have used this already, buy unless I am missing something  
> this will
> always return scores based on the entire index.  I see now way from  
> the
> documentation to have it recalculate TF-IDF scores using only a  
> subset of
> documents.  Am I missing something?

Nope, it was me misunderstanding the question.

>
> Are you saying I can do a filter query using fq= and then use this  
> request
> handler to get different TF-IDF scores?

No.  Sorry for the confusion.  I think you could likely extend the TVC  
(TermVecComp) to do what you want, but it is not there out of the box.

>
>
> Grant Ingersoll-6 wrote:
>>
>> Have a look at the TermVectorComponent:
>> http://wiki.apache.org/solr/TermVectorComponent
>> .  That might help.
>>
>> On Oct 28, 2009, at 10:30 PM, peelman wrote:
>>
>>>
>>> I have an index of about 3 million documents, and specific list of
>>> document
>>> ids that belong in that 3 million (somewhere around 20-50  
>>> documents on
>>> average).  With my filtered list of documents I want to be able to  
>>> get
>>> TF-IDF scores calculated based on only that small subset, instead of
>>> the
>>> scores from the entire 3 million document index.
>>>
>>> Is there an easy way to do this using a filtered/subquery, or via
>>> any other
>>> means?
>>>
>>> Presently I am testing by creating a new index out of the subset of
>>> documents to get the TF-IDF scores, but obviously that is not going
>>> to work
>>> or scale in a finished implementation.
>>>
>>> Thanks in advance.
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-From-Subset-of-Documents-tp26105328p26105328.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-or-MoreLikeThis-From-Subset-of-Documents-tp26105328p26114900.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message