lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <>
Subject Re: Flexible index format / Payloads Cont'd
Date Fri, 30 Jun 2006 13:07:30 GMT
On Thu, Jun 29, 2006, Marvin Humphrey wrote about "Re: Flexible index format / Payloads Cont'd":
>   * Improve IR precision, by writing a Boolean Scorer that
>     takes position into account, a la Brin/Page '98.

Yes, I'd love to see that too (and it doesn't even require any new payloads
support, the positions that Lucene already has are enough).

I tried a small test using the Trec 8 corpus and query-relevance judgements,
and saw a noticable improvement in precision when I added a simplistic
version of this feature: I "or"ed the original query words with
SpanNearQuery's of each pair of words in the query, so the query of
"hot dog bun" will be converted to something similar to:

	hot OR dog OR bun OR "hot dog"~7^0.25 "dog bun"~7^0.25 "hot bun"~7^0.25

But this "solution" is obviously not the best we can do: it is inefficient
(goes through each posting list three times), and not tuned. A better solution
would be like you said, to create a modified version of BooleanQuery's

Nadav Har'El                        |       Friday, Jun 30 2006, 4 Tammuz 5766
IBM Haifa Research Lab              |-----------------------------------------
                                    |Give Yogi a rifle. Support your right to           |arm bears!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message