lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11240) Raise UnInvertedField internal limit
Date Wed, 16 Aug 2017 18:44:01 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129240#comment-16129240
] 

Dawid Weiss commented on SOLR-11240:
------------------------------------

Just looking around casually, not verifying in-depth.

{code}
+ *   A single entry is thus either 0b0xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx holding 0-4 vInts
or
+ *   0b0xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx holding a 31-bit pointer.
{code}
Somewhere in the above bitmasks the highest bit should be set :)

{code}
+  // TODO: Why is indexedTermsArray not part of this?
   /** Returns total bytes used. */
   public long ramBytesUsed() {
{code}

I'd piggyback that in and correct it in this issue.

{code}
+  @@Slow
+  public void testTriggerUnInvertLimit() throws IOException {
{code}


> Raise UnInvertedField internal limit
> ------------------------------------
>
>                 Key: SOLR-11240
>                 URL: https://issues.apache.org/jira/browse/SOLR-11240
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: faceting
>    Affects Versions: 5.5.4, 6.6
>            Reporter: Toke Eskildsen
>            Assignee: Toke Eskildsen
>            Priority: Minor
>              Labels: easyfix
>             Fix For: master (8.0)
>
>         Attachments: SOLR-11240.patch
>
>
> {{UnInvertedField}} has via {{DocTermOrds}} an internal limitation of 2^24 bytes for
byte-arrays holding term ordinals. For String faceting on high-cardinality Text fields, this
can trigger the exception with "Too many values for UnInvertedField". A search for that phrase
shows that the exception is encountered in the wild.
> The limitation is due to the packing being a combination of values and pointers: If the
values (term ordinals) for a given document-ID can fit in an integer, they are stored directly.
If the value of the first 8 bits in the integer is 1, it signals that the following 3 bytes
(24 bits) is a pointer into a byte-array, limiting the array-size to 16M (2^24).
> Solution: Due to the values being packed at vInts, bit 31 (the last bit) of the integer
will never be 1 if the integer contains values. This means that this bit it can be used for
signalling whether or not the preceding bits should be parsed as values or a pointer. The
effective pointer size is thus 2^31, which matches the array-length limit in Java. Changing
the signalling mechanism does not affect space requirements and should not affect performance.
> Note that this is only a 100-fold increase ever the 2^24 limit, not an elimination: Performing
uninverted Text field faceting on 100M documents with 5K terms each will still raise an exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message