lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Query Scoring
Date Mon, 02 Jan 2006 07:56:23 GMT

: My requirement is to show the relevant fragments of the news article for
: each company along with the search results. But the highlighter api
: sometimes picks up the fragments which are not so relevant to the news
: article/company. I would like to know if there is anyway that I can
: modify the scoring/ranking of these fragments in such a way that the
: news items in which a company name & keywords in the headline gets
: assigned a very strong relevancy ranking,  closely followed by a company
: name mention in the first paragraph and a  multiple-mention within the
: entire story. Something like headline =   5 points,  first paragraph =
: four, etc.

Well, the sample query you mentioned isn't checking any company names, or
doing anything with a "keywords" field.  I'm not to familiar with the way
the highlighter package works, but i imagine that with the types of
queries you said you are using, if you are highlighting the "Content"
field, the CompanyId and the FilingDate clauses of your query will be
fairly irelevent (becuase they are numbers, not because they are different
field names)

An idea i've suggested before (but i don't remember if anyone ever said
wether it is a viable use of the Highlighter or not) is to give the
highlighter a completely different Query object then the one you used to
get your search results.

ie, if you search query (what you want used to compute score) is...

  +(CompanyId:10 CompanyId:20) Content:"cost saving" Content:outsource

...but once you've gotten those results, what you really care about is
highlighting the name of the company, and you think the best fragments
when those company names appear near the other words, then give the
highlighter a query that looks like...

  "companyname10 cost savings"~20 "companyname20 outsource"~20 ...etc

: >>> Here is the search query(BooleanQuery) I am passing to the
: >>> IndexSearcher
: >>> and QueryScorer:
: >>> +DocumentType:news
: >>> +(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40)
: >>> +FilingDate:[20041201 TO 20051201]
: >>> +(Content:"cost saving" Content:"cost savings" Content:outsource
: >>> Content:outsources Content:downsize Content:downsizes
: >>> Content:restructuring Content:restructure)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message