lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryant, Michael" <michael.bry...@kcl.ac.uk>
Subject Simulating group.facet for JSON facets, high mem usage w/ sorting on aggregation...
Date Thu, 09 Feb 2017 11:58:29 GMT
Hi all,

I'm converting my legacy facets to JSON facets and am seeing much better performance, especially
with high cardinality facet fields. However, the one issue I can't seem to resolve is excessive
memory usage (and OOM errors) when trying to simulate the effect of "group.facet" to sort
facets according to a grouping field.

My situation, slightly simplified is:

Solr 4.6.1

  *   Doc set: ~200,000 docs
  *   Grouping by item_id, an indexed, stored, single value string field with ~50,000 unique
values, ~4 docs per item
  *   Faceting by person_id, an indexed, stored, multi-value string field with ~50,000 values
(w/ a very skewed distribution)
  *   No docValues fields

Each document here is a description of an item, and there are several descriptions per item
in multiple languages.

With legacy facets I use group.field=item_id and group.facet=true, which gives me facet counts
with the number of items rather than descriptions, and correctly sorted by descending item
count.

With JSON facets I'm doing the equivalent like so:

&json.facet={
    "people": {
        "type": "terms",
        "field": "person_id",
        "facet": {
            "grouped_count": "unique(item_id)"
        },
        "sort": "grouped_count desc"
    }
}

This works, and is somewhat faster than legacy faceting, but it also produces a massive spike
in memory usage when (and only when) the sort parameter is set to the aggregate field. A server
that runs happily with a 512MB heap OOMs unless I give it a 4GB heap. With sort set to (the
default) "count desc" there is no memory usage spike.

I would be curious if anyone has experienced this kind of memory usage when sorting JSON facets
by stats and if there’s anything I can do to mitigate it. I’ve tried reindexing with docValues
enabled on the relevant fields and it seems to make no difference in this respect.

Many thanks,
~Mike
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message