lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Paging bug in ReRankingQParserPlugin?
Date Tue, 05 Aug 2014 16:49:56 GMT
You can also have a sliding re-ranking horizon. That is how we did it in Ultraseek.

http://observer.wunderwood.org/2007/04/04/progressive-reranking/

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/


On Aug 5, 2014, at 9:38 AM, Joel Bernstein <joelsolr@gmail.com> wrote:

> I updated the docs for now. But I agree this paging issue needs to be
> handled transparently. Feel free to create a jira issue for this or I can
> create one when I have time to start looking into it.
> 
> Joel Bernstein
> Search Engineer at Heliosearch
> 
> 
> On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac <adairkovac@gmail.com> wrote:
> 
>> Thanks, great explanation! Yeah, if it keeps the current behavior added
>> documentation would be great.
>> 
>> Are there any other features that expect parameters to change as one
>> pages? If not I'm concerned that it might be hard to support for clients
>> that assume only the index params will change. It also makes it harder to
>> work if we want to add re-ranking on a strict small set of results on the
>> first page, because then we'd have to stitch together two result sets. We
>> don't currently want to do that, though.
>> 
>> For what it's worth, what my colleague who linked me the feature and I
>> both assumed the behavior would be is that it would get all the results and
>> return the ones past the re-ranking point as-is. Is that possible?
>> 
>> Thanks,
>> 
>> Adair
>> 
>> 
>> 
>> 
>> On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein <joelsolr@gmail.com> wrote:
>> 
>>> The comment in the code reads slightly different:
>>> 
>>> // This enusres that reRankDocs >= docs needed to satisfy the result set.
>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>> 
>>> I think you're right though that this is confusing. The way the
>>> ReRankingQParserPlugin works is that it grabs the top X documents
>>> (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
>>> to satisfy the page then the result won't have enough documents.
>>> 
>>> The intended use of this was actually to stop using query re-ranking when
>>> you paged past the reRanked results. So if you re-rank the top 200
>>> documents, you would drop the re-ranking parameter when you page to
>>> documents 201-220.
>>> 
>>> So the line:
>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>> 
>>> Saves you from an unexpected shortfall in documents if you do page beyond
>>> the reRankDocs. At the very least the expected use should be documented and
>>> if we can figure out better behavior here that would be great.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Joel Bernstein
>>> Search Engineer at Heliosearch
>>> 
>>> 
>>> On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairkovac@gmail.com> wrote:
>>> 
>>>> Looking at this line in the code:
>>>> 
>>>> // This enusres that reRankDocs <= docs needed to satisfy the result set.
>>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>>> 
>>>> This looks like it would cause skips and duplicates while paging through
>>>> the results, since if you exceed the reRankDocs parameter and keep finding
>>>> things that match the re-ranking query, they'll get boosted earlier
>>>> (skipped), thus pushing down items you already saw (causing duplicates).
>>>> 
>>>> It's obviously intentional behavior, but there's no documentation I can
>>>> see of why, if you request fewer documents to be re-ranked than you're
>>>> asking to view, it goes ahead and ignores the number you asked for. What
if
>>>> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
>>>> to make the client choose whether to increase the reRankDocs or leave it
>>>> the same?
>>>> 
>>>> If no one replies and I have time, I might check out 4.9 and see if I
>>>> can confirm or disprove the bug, but figured I'd bring it up now in case
I
>>>> don't end up having time. It would be good to document the reason for this
>>>> behavior if it turns out it's necessary.
>>>> 
>>>> Thanks. I'm excited about this feature btw.
>>>> 
>>>> --Adair
>>>> 
>>> 
>>> 
>> 


Mime
View raw message