lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Paging bug in ReRankingQParserPlugin?
Date Tue, 05 Aug 2014 16:38:22 GMT
I updated the docs for now. But I agree this paging issue needs to be
handled transparently. Feel free to create a jira issue for this or I can
create one when I have time to start looking into it.

Joel Bernstein
Search Engineer at Heliosearch


On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac <adairkovac@gmail.com> wrote:

> Thanks, great explanation! Yeah, if it keeps the current behavior added
> documentation would be great.
>
> Are there any other features that expect parameters to change as one
> pages? If not I'm concerned that it might be hard to support for clients
> that assume only the index params will change. It also makes it harder to
> work if we want to add re-ranking on a strict small set of results on the
> first page, because then we'd have to stitch together two result sets. We
> don't currently want to do that, though.
>
> For what it's worth, what my colleague who linked me the feature and I
> both assumed the behavior would be is that it would get all the results and
> return the ones past the re-ranking point as-is. Is that possible?
>
> Thanks,
>
> Adair
>
>
>
>
> On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein <joelsolr@gmail.com> wrote:
>
>> The comment in the code reads slightly different:
>>
>> // This enusres that reRankDocs >= docs needed to satisfy the result set.
>> reRankDocs = Math.max(start+rows, reRankDocs);
>>
>> I think you're right though that this is confusing. The way the
>> ReRankingQParserPlugin works is that it grabs the top X documents
>> (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
>> to satisfy the page then the result won't have enough documents.
>>
>> The intended use of this was actually to stop using query re-ranking when
>> you paged past the reRanked results. So if you re-rank the top 200
>> documents, you would drop the re-ranking parameter when you page to
>> documents 201-220.
>>
>> So the line:
>> reRankDocs = Math.max(start+rows, reRankDocs);
>>
>> Saves you from an unexpected shortfall in documents if you do page beyond
>> the reRankDocs. At the very least the expected use should be documented and
>> if we can figure out better behavior here that would be great.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairkovac@gmail.com> wrote:
>>
>>> Looking at this line in the code:
>>>
>>> // This enusres that reRankDocs <= docs needed to satisfy the result set.
>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>>
>>> This looks like it would cause skips and duplicates while paging through
>>> the results, since if you exceed the reRankDocs parameter and keep finding
>>> things that match the re-ranking query, they'll get boosted earlier
>>> (skipped), thus pushing down items you already saw (causing duplicates).
>>>
>>> It's obviously intentional behavior, but there's no documentation I can
>>> see of why, if you request fewer documents to be re-ranked than you're
>>> asking to view, it goes ahead and ignores the number you asked for. What if
>>> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
>>> to make the client choose whether to increase the reRankDocs or leave it
>>> the same?
>>>
>>> If no one replies and I have time, I might check out 4.9 and see if I
>>> can confirm or disprove the bug, but figured I'd bring it up now in case I
>>> don't end up having time. It would be good to document the reason for this
>>> behavior if it turns out it's necessary.
>>>
>>> Thanks. I'm excited about this feature btw.
>>>
>>> --Adair
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message