lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
Date Tue, 07 Jan 2014 16:44:11 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-5463:
---------------------------

    Description: 
I'd like to revist a solution to the problem of "deep paging" in Solr, leveraging an HTTP
based API similar to how IndexSearcher.searchAfter works at the lucene level: require the
clients to provide back a token indicating the sort values of the last document seen on the
previous "page".  This is similar to the "cursor" model I've seen in several other REST APIs
that support "pagnation" over a large sets of results (notable the twitter API and it's "since_id"
param) except that we'll want something that works with arbitrary multi-level sort critera
that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key
bit of argument parsing to leverage it was commented out due to some problems (see comments
in that issue).  It's also somewhat out of date at this point: at the time it was commited,
IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and
the params added in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on ensuring that
we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode

{panel:title=Basic Usage}
* send a request with {{sort=X&start=0&rows=N&cursorMark=*}}
** sort can be anything, but must include the uniqueKey field (as a tie breaker) 
** "N" can be any number you want per page
** start must be "0"
** "\*" denotes you want to use a cursor starting at the beginning mark
* parse the response body and extract the (String) {{nextCursorMark}} value
* Replace the "\*" value in your initial request params with the {{nextCursorMark}} value
from the response in the subsequent request
* repeat until the {{nextCursorMark}} value stops changing, or you have collected as many
docs as you need
{panel}


  was:
I'd like to revist a solution to the problem of "deep paging" in Solr, leveraging an HTTP
based API similar to how IndexSearcher.searchAfter works at the lucene level: require the
clients to provide back a token indicating the sort values of the last document seen on the
previous "page".  This is similar to the "cursor" model I've seen in several other REST APIs
that support "pagnation" over a large sets of results (notable the twitter API and it's "since_id"
param) except that we'll want something that works with arbitrary multi-level sort critera
that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key
bit of argument parsing to leverage it was commented out due to some problems (see comments
in that issue).  It's also somewhat out of date at this point: at the time it was commited,
IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and
the params added in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on ensuring that
we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode



> Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie:
"deep paging")
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5463
>                 URL: https://issues.apache.org/jira/browse/SOLR-5463
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>             Fix For: 5.0
>
>         Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch,
SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
SOLR-5463__straw_man__MissingStringLastComparatorSource.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, leveraging an
HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require
the clients to provide back a token indicating the sort values of the last document seen on
the previous "page".  This is similar to the "cursor" model I've seen in several other REST
APIs that support "pagnation" over a large sets of results (notable the twitter API and it's
"since_id" param) except that we'll want something that works with arbitrary multi-level sort
critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while ago, but
the key bit of argument parsing to leverage it was commented out due to some problems (see
comments in that issue).  It's also somewhat out of date at this point: at the time it was
commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field
sorts; and the params added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on ensuring
that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode
> {panel:title=Basic Usage}
> * send a request with {{sort=X&start=0&rows=N&cursorMark=*}}
> ** sort can be anything, but must include the uniqueKey field (as a tie breaker) 
> ** "N" can be any number you want per page
> ** start must be "0"
> ** "\*" denotes you want to use a cursor starting at the beginning mark
> * parse the response body and extract the (String) {{nextCursorMark}} value
> * Replace the "\*" value in your initial request params with the {{nextCursorMark}} value
from the response in the subsequent request
> * repeat until the {{nextCursorMark}} value stops changing, or you have collected as
many docs as you need
> {panel}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message