lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincenzo D'Amore" <v.dam...@gmail.com>
Subject Re: Solr Pagination
Date Thu, 03 Aug 2017 10:54:24 GMT
Don't spend your time reading this, I've just found an answer in the
documentation:


> *One way to ensure that a document will never be returned more then once,
> is to use the uniqueKey field as the primary (and therefore: only
> significant) sort criterion. **In this situation, you will be guaranteed
> that each document is only returned once, no matter how it may be be
> modified during the use of the cursor.*


https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results



On Thu, Aug 3, 2017 at 12:47 PM, Vincenzo D'Amore <v.damore@gmail.com>
wrote:

> Hi all,
>
> I have a collection that is frequently updated, is it possible that a Solr
> Cloud query returns duplicate documents while paginating?
>
> Just to be clear, there is a collection with about 3M of documents and a
> Solr query selects just 500K documents sorted by Id, which are returned
> simply paginating the results with the parameters start, rows and sort.
>
> The query is like this one:
>
> http://localhost:8983/solr/collection1/select?q=idCat:1&
> start=0&rows=20000&sort=id asc
>
> To be honest, I've not verified personally, but the consumer of this query
> claims that after few trials, duplicate documents where returned.
>
> Given that the collection is frequently updated, I suppose that adding a
> large bunch of new documents during the pagination can affect the index and
> change the order of results.
>
> In other words, if I have 500K documents returned by 25 queries (20K
> documents for each request) and during the iteration, 1000 new documents
> are inserted.
> Given that I have a query sorted by Id, I think it is possibile that the
> documents returned reflect the new order, so it is possible that a document
> returned in a previous query now is also present in the current results.
>
> Again, I'm trying to solve this problem using the deep paging.
>
> I have read that "unlike basic pagination, Cursor pagination does not rely
> on using an absolute "offset" into the completed sorted list of matching
> documents.  Instead, the cursorMark specified in a request encapsulates
> information about the relative position of the last document returned,
> based on the absolute sort values of that document.  This means that the
> impact of index modifications is much smaller when using a cursor compared
> to basic pagination."
>
> What do you think about, am I right? The deep paging can help to solve
> this problem?
>
> Best regards and thanks for your time,
> Vincenzo
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message