lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen <t...@kb.dk>
Subject Re: Solr Server crashes when requesting a result with too large resultRows
Date Wed, 08 Aug 2018 09:10:37 GMT
On Tue, 2018-07-31 at 11:12 +0200, Fette, Georg wrote:
> I agree that receiving too much data in one request is bad. But I
> was surprised that the query works with a lower but still very large
> rows parameter and that there is a threshold at which it crashes the
> server. 
> Furthermore, it seems that the reason for the crash is not the size
> of the actual results because those are only 581.

Under the hood, a priority queue is initialized with room for
min(#docs_in_index, rows) document markers. Furthermore that queue is
initialized with placeholder objects (called sentinels). This
structure becomes heavy when entering millions territory, both in terms
of raw memory and in terms of GC overhead due to all the objects. You
could have 1 hit and it would still hit OOM.

It is possible to optimize that part of the Solr code for larger
requests (see https://issues.apache.org/jira/browse/LUCENE-2127 and htt
ps://issues.apache.org/jira/browse/LUCENE-6828), but that would just be
a temporary fix until even larger indexes are queried. The deep paging
or streaming exports that Andrea suggests scales indefinitely in terms
of both documents in the index and documents in the result set.

I would argue your OOM with small result sets and huge rows is a good
thing: You encounter the problem immediately, instead of hitting it at
some random time when a match-a-lot query is issued by a user.

- Toke Eskildsen, Royal Danish Library


Mime
View raw message