lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincenzo D'Amore" <v.dam...@gmail.com>
Subject Solr Pagination
Date Thu, 03 Aug 2017 10:47:55 GMT
Hi all,

I have a collection that is frequently updated, is it possible that a Solr
Cloud query returns duplicate documents while paginating?

Just to be clear, there is a collection with about 3M of documents and a
Solr query selects just 500K documents sorted by Id, which are returned
simply paginating the results with the parameters start, rows and sort.

The query is like this one:

http://localhost:8983/solr/collection1/select?q=idCat:1&start=0&rows=20000&sort=id
asc

To be honest, I've not verified personally, but the consumer of this query
claims that after few trials, duplicate documents where returned.

Given that the collection is frequently updated, I suppose that adding a
large bunch of new documents during the pagination can affect the index and
change the order of results.

In other words, if I have 500K documents returned by 25 queries (20K
documents for each request) and during the iteration, 1000 new documents
are inserted.
Given that I have a query sorted by Id, I think it is possibile that the
documents returned reflect the new order, so it is possible that a document
returned in a previous query now is also present in the current results.

Again, I'm trying to solve this problem using the deep paging.

I have read that "unlike basic pagination, Cursor pagination does not rely
on using an absolute "offset" into the completed sorted list of matching
documents.  Instead, the cursorMark specified in a request encapsulates
information about the relative position of the last document returned,
based on the absolute sort values of that document.  This means that the
impact of index modifications is much smaller when using a cursor compared
to basic pagination."

What do you think about, am I right? The deep paging can help to solve this
problem?

Best regards and thanks for your time,
Vincenzo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message