lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harini Raghavan <>
Subject Re: Query Scoring
Date Mon, 02 Jan 2006 07:26:57 GMT
Yes I was refering to how IDF is used in the Highlighter code to find 
out how to prioritize fragments of the documents.

My requirement is to show the relevant fragments of the news article for 
each company along with the search results. But the highlighter api 
sometimes picks up the fragments which are not so relevant to the news 
article/company. I would like to know if there is anyway that I can 
modify the scoring/ranking of these fragments in such a way that the 
news items in which a company name & keywords in the headline gets 
assigned a very strong relevancy ranking,  closely followed by a company 
name mention in the first paragraph and a  multiple-mention within the 
entire story. Something like headline =   5 points,  first paragraph = 
four, etc.


markharw00d wrote:

> Sorry to contradict, Erik, but the Highlighter's QueryScorer will make 
> use of IDF, given a reader, in order to better prioritise which are 
> the "best" bits of a document.
> However, In the particular example given, the criteria includes 
> several non-text fields which are not useful for IDF and general 
> scoring purposes - these are perhaps better expressed using a filter 
> of some form. Otherwise, why should the scarcity of a particular date 
> in the given range boost one matching document above others? These 
> numeric-type fields are simply mandatory boolean "hygiene factors" 
> and  should ideally play no part in highlight selection or results 
> ordering in general based on their IDF or TF.
> Cheers,
> Mark
> Erik Hatcher wrote:
>> Harini,
>> I'm not sure I understand what you're asking.  IDF doesn't factor  
>> into highlighting.
>> IDF calculations are useful in scoring documents during a search,  
>> such that the most relevant documents are returned, but again this 
>> is  unrelated to highlighting.
>> Could you elaborate on what you're after?
>>     Erik
>> On Dec 30, 2005, at 12:02 PM, Harini Raghavan wrote:
>>> Hi,
>>> I have a requirement to highlight search keywords in the results and
>>> display the matching fragment of the text with the results. I am using
>>> the Hits highlighting mentioned in Lucene in Action.
>>> Here is the search query(BooleanQuery) I am passing to the  
>>> IndexSearcher
>>> and QueryScorer:
>>> +DocumentType:news
>>> +(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40)
>>> +FilingDate:[20041201 TO 20051201]
>>> +(Content:"cost saving" Content:"cost savings" Content:outsource
>>> Content:outsources Content:downsize Content:downsizes
>>> Content:restructuring Content:restructure)
>>> I do not quite understand how the query scoring actually works &  
>>> how Inverse Document Frequency(IDF) calculations are useful?  Can
>>> someone shed some light on this using the given query as an example?
>>> Thanks,
>>> Harini
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ___________________________________________________________ NEW Yahoo! 
> Cars - sell your car and browse thousands of new and used cars online! 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message