lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <>
Subject [jira] [Created] (SOLR-11240) Raise UnInvertedField internal limit
Date Tue, 15 Aug 2017 10:55:00 GMT
Toke Eskildsen created SOLR-11240:

             Summary: Raise UnInvertedField internal limit
                 Key: SOLR-11240
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: faceting
    Affects Versions: 6.6, 5.5.4, master (8.0)
            Reporter: Toke Eskildsen
            Assignee: Toke Eskildsen
            Priority: Minor
             Fix For: master (8.0), 6.6, 5.5.4

{{UnInvertedField}} has via {{DocTermOrds}} an internal limitation of 2^24 bytes for byte-arrays
holding term ordinals. For String faceting on high-cardinality Text fields, this can trigger
the exception with "Too many values for UnInvertedField". A search for that phrase shows that
the exception is encountered in the wild.

The limitation is due to the packing being a combination of values and pointers: If the values
(term ordinals) for a given document-ID can fit in an integer, they are stored directly. If
the value of the first 8 bits in the integer is 1, it signals that the following 3 bytes (24
bits) is a pointer into a byte-array, limiting the array-size to 16M (2^24).

Solution: Due to the values being packed at vInts, bit 31 (the last bit) of the integer will
never be 1 if the integer contains values. This means that this bit it can be used for signalling
whether or not the preceding bits should be parsed as values or a pointer. The effective pointer
size is thus 2^31, which matches the array-length limit in Java. Changing the signalling mechanism
does not affect space requirements and should not affect performance.

Note that this is only a 100-fold increase ever the 2^24 limit, not an elimination: Performing
uninverted Text field faceting on 100M documents with 5K terms each will still raise an exception.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message