lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Rosenwald <thestri...@gmail.com>
Subject Re: OOM when using Lucene 5.X's group facet collectors on unsharded index
Date Tue, 07 Jul 2015 00:24:56 GMT
Correction.  The three facet fields used to initialize the 
TermGroupFacetCollectors are SortedDVs - not SortedNumericDVs.

On 07/06/2015 04:56 PM, Adam Rosenwald wrote:
> Hello all,
>
>     When using Lucene 5.X's group facet collectors (i.e. 
> *AbstractGroupFacetCollector* and the provided concrete 
> implementation, *TermGroupFacetCollector*), I repeatedly encounter OOM 
> errors after running a few search requests on an unsharded index 
> consisting of a few million documents. I had experienced the issue in 
> Lucene 5.0.0 and still see it when using 5.2.1.
>
>     I've initialized three such collectors to accumulate values over 
> three different facet fields (all SortedNumericDV fields). The 
> collectors all look like the following:
>
> ==BEGIN CODE BLOCK==
>
>     AbstractGroupFacetCollector thisFacetCollector =
>     TermGroupFacetCollector.createTermGroupFacetCollector(groupField,
>                         thisFacetField, facetFieldMultivalued,
>     facetPrefix, initialSize);
>
> ==END CODE BLOCK==
>
>     Note that facetFieldMultivalued = false, facetPrefix = null, and 
> initialSize = 128.  There are a few million unique groups indexed in 
> the group field.  The heap blows up regardless of the number of unique 
> entries in the facet field (one of the facet fields has, e.g., fewer 
> than 100 unique values).
>
>     I have confirmed that the heap ballooning /only/ occurs during 
> collection time (i.e. if I comment out the three 
> TermGroupFacetCollector assignments, I have no OOM issues; even if 
> only one of them is enabled, the heap will eventually face OOM).
>
>     Some additional system-related bits.  I'm running Lucene 5.2.1 on 
> a dev environment w/ ~8GB heap space w/ 16GB total RAM.  I am not 
> using any special codecs.  I've confirmed that the indexes (incl. the 
> sidecar facet indexes) get opened only once during initialization of 
> the service.  Both the index and sidecar facet index directories are 
> opened as NIOFSDirectory objects.  I have also tried MMapDirectory and 
> experience the same problem.
>
>     After profiling the heap extensively and after reading the Lucene 
> group faceting source code, I suspect  that the DVs (for both the 
> group and facet fields) and/or  the arrays used to accumulate facet 
> counts remain memory resident.  After executing the same set of 
> queries multiple times, I see heap usage balloon by 1-2GB at a time.  
> I've tried segmenting the index, but while that reduces heap usage for 
> ad-hoc searches, it does not get rid of the OOM issue.
>
>     Any help here would be greatly appreciated.  Many thanks in advance.
>
> --A.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message