lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adair Kovac <adairko...@gmail.com>
Subject Re: Paging bug in ReRankingQParserPlugin?
Date Tue, 05 Aug 2014 18:46:47 GMT
Thanks, Joel. I created SOLR-6323.


On Tue, Aug 5, 2014 at 10:38 AM, Joel Bernstein <joelsolr@gmail.com> wrote:

> I updated the docs for now. But I agree this paging issue needs to be
> handled transparently. Feel free to create a jira issue for this or I can
> create one when I have time to start looking into it.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac <adairkovac@gmail.com> wrote:
>
>> Thanks, great explanation! Yeah, if it keeps the current behavior added
>> documentation would be great.
>>
>> Are there any other features that expect parameters to change as one
>> pages? If not I'm concerned that it might be hard to support for clients
>> that assume only the index params will change. It also makes it harder to
>> work if we want to add re-ranking on a strict small set of results on the
>> first page, because then we'd have to stitch together two result sets. We
>> don't currently want to do that, though.
>>
>> For what it's worth, what my colleague who linked me the feature and I
>> both assumed the behavior would be is that it would get all the results and
>> return the ones past the re-ranking point as-is. Is that possible?
>>
>> Thanks,
>>
>> Adair
>>
>>
>>
>>
>> On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein <joelsolr@gmail.com>
>> wrote:
>>
>>> The comment in the code reads slightly different:
>>>
>>> // This enusres that reRankDocs >= docs needed to satisfy the result set.
>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>>
>>> I think you're right though that this is confusing. The way the
>>> ReRankingQParserPlugin works is that it grabs the top X documents
>>> (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
>>> to satisfy the page then the result won't have enough documents.
>>>
>>> The intended use of this was actually to stop using query re-ranking
>>> when you paged past the reRanked results. So if you re-rank the top 200
>>> documents, you would drop the re-ranking parameter when you page to
>>> documents 201-220.
>>>
>>> So the line:
>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>>
>>> Saves you from an unexpected shortfall in documents if you do page
>>> beyond the reRankDocs. At the very least the expected use should be
>>> documented and if we can figure out better behavior here that would be
>>> great.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Joel Bernstein
>>> Search Engineer at Heliosearch
>>>
>>>
>>> On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairkovac@gmail.com>
>>> wrote:
>>>
>>>> Looking at this line in the code:
>>>>
>>>> // This enusres that reRankDocs <= docs needed to satisfy the result
>>>> set.
>>>> reRankDocs = Math.max(start+rows, reRankDocs);
>>>>
>>>> This looks like it would cause skips and duplicates while paging
>>>> through the results, since if you exceed the reRankDocs parameter and keep
>>>> finding things that match the re-ranking query, they'll get boosted earlier
>>>> (skipped), thus pushing down items you already saw (causing duplicates).
>>>>
>>>> It's obviously intentional behavior, but there's no documentation I can
>>>> see of why, if you request fewer documents to be re-ranked than you're
>>>> asking to view, it goes ahead and ignores the number you asked for. What
if
>>>> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
>>>> to make the client choose whether to increase the reRankDocs or leave it
>>>> the same?
>>>>
>>>> If no one replies and I have time, I might check out 4.9 and see if I
>>>> can confirm or disprove the bug, but figured I'd bring it up now in case
I
>>>> don't end up having time. It would be good to document the reason for this
>>>> behavior if it turns out it's necessary.
>>>>
>>>> Thanks. I'm excited about this feature btw.
>>>>
>>>> --Adair
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message