lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harini Raghavan <harini.ragha...@insideview.com>
Subject Re: Query Scoring
Date Tue, 03 Jan 2006 04:47:49 GMT
Thank you Chris. That seems like a good suggestion. I will try to pass a 
different Query object to the Highlighter api that the one used for 
searching.

I plan to break down the HTML document and store the title/sub 
title/content in different fields of the index. So if I create a new 
query comparing company name and keywords against title and content 
fields, then I am assuming that highlighter api will give a higher 
ranking to the fragment where both terms of the query match against 
those fragments where just one term(either title or content) matches. I 
am assuming that even if I do not increase the boost factor of any of 
the terms, the api will take care of this ranking.
This is my understanding of the scoring/ranking algorithm. Any comments 
anyone?

Thanks,
Harini

Chris Hostetter wrote:

>: My requirement is to show the relevant fragments of the news article for
>: each company along with the search results. But the highlighter api
>: sometimes picks up the fragments which are not so relevant to the news
>: article/company. I would like to know if there is anyway that I can
>: modify the scoring/ranking of these fragments in such a way that the
>: news items in which a company name & keywords in the headline gets
>: assigned a very strong relevancy ranking,  closely followed by a company
>: name mention in the first paragraph and a  multiple-mention within the
>: entire story. Something like headline =   5 points,  first paragraph =
>: four, etc.
>
>Well, the sample query you mentioned isn't checking any company names, or
>doing anything with a "keywords" field.  I'm not to familiar with the way
>the highlighter package works, but i imagine that with the types of
>queries you said you are using, if you are highlighting the "Content"
>field, the CompanyId and the FilingDate clauses of your query will be
>fairly irelevent (becuase they are numbers, not because they are different
>field names)
>
>An idea i've suggested before (but i don't remember if anyone ever said
>wether it is a viable use of the Highlighter or not) is to give the
>highlighter a completely different Query object then the one you used to
>get your search results.
>
>ie, if you search query (what you want used to compute score) is...
>
>  +(CompanyId:10 CompanyId:20) Content:"cost saving" Content:outsource
>
>...but once you've gotten those results, what you really care about is
>highlighting the name of the company, and you think the best fragments
>when those company names appear near the other words, then give the
>highlighter a query that looks like...
>
>  "companyname10 cost savings"~20 "companyname20 outsource"~20 ...etc
>
>
>
>: >>> Here is the search query(BooleanQuery) I am passing to the
>: >>> IndexSearcher
>: >>> and QueryScorer:
>: >>> +DocumentType:news
>: >>> +(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40)
>: >>> +FilingDate:[20041201 TO 20051201]
>: >>> +(Content:"cost saving" Content:"cost savings" Content:outsource
>: >>> Content:outsources Content:downsize Content:downsizes
>: >>> Content:restructuring Content:restructure)
>
>
>
>-Hoss
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message