lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gili Nachum <gilinac...@gmail.com>
Subject Best practices in boosting by proximity?
Date Sat, 04 May 2013 17:46:58 GMT
Hi. *I would like for hits that contain the search terms in proximity to
each other to be ranked higher than hits in which the terms are scattered
across the doc.
Wondering if there's a best practice to achieve that?*
I also want that all hits will contain all of the search terms (implicit
AND):

*Example:* when users search for: "lannisters always pay their debts", the
4 matching results should be ranked the following (for simplicity, assume
equal field norms, and TF/IDF, in all hits):
1. "It is known that *Lannisters always pay their debts*"
2. "... Lannisters ... they sometimes *pay their debts* ... always with you"
3. *"Lannisters always *win ... debts ... pay tax ... their nature"
4. "Lannisters ... always ... pay ... their ... debts"

The first result has all 5 terms in proximity to each other.
The second has 3 terms in proximity.
The third has 2 terms in proximity.
The forth has none of the terms in proximity to each other.

My current AND query that ignores proximity is: +lannisters +always +pay
+their +debts
So if there are M terms, I was thinking that I could add M-1 SHOULD phrase
queries to the original query:
"lannisters always" "always pay" "pay their" "their debts".

What are the pros and cons? Are there alternatives to consider?
Any Lucene class that helps achieve this?

Thx!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message