lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laura Dietz (JIRA)" <>
Subject [jira] [Created] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
Date Fri, 05 Jan 2018 06:16:01 GMT
Laura Dietz created LUCENE-8118:

             Summary: ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during
                 Key: LUCENE-8118
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 7.2
         Environment: Debian/Stretch
java version "1.8.0_144"                                                                 
                           Java(TM) SE Runtime Environment (build 1.8.0_144-b01)         
                                                      Java HotSpot(TM) 64-Bit Server VM (build
25.144-b01, mixed mode)
            Reporter: Laura Dietz

Indexing a large collection of about 20 million paragraph-sized documents results in an ArrayIndexOutOfBoundsException
in org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace below). 

The bug is possibly related to issues described in [here|]
 and [SOLR-10936|] -- but I am not using SOLR,
I am directly using Lucene Core.

The issue can be reproduced using code from  [GitHub trec-car-tools-example|]

- compile with `mvn compile assembly:single`
- run with `java -cp ./target/treccar-tools-example-0.1-jar-with-dependencies.jar edu.unh.cs.TrecCarBuildLuceneIndex
paragraphs paragraphCorpus.cbor indexDir`

Where paragraphCorpus.cbor is contained in this [archive|]

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536              
                                                            at org.apache.lucene.index.TermsHashPerField.writeByte(
                                  at org.apache.lucene.index.TermsHashPerField.writeVInt(
                                  at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(
                at org.apache.lucene.index.TermsHashPerField.add(
                                        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(
                      at org.apache.lucene.index.DefaultIndexingChain.processField(
                         at org.apache.lucene.index.DefaultIndexingChain.processDocument(
                      at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(
              at org.apache.lucene.index.DocumentsWriter.updateDocuments(
                                at org.apache.lucene.index.IndexWriter.updateDocuments(
                                       at org.apache.lucene.index.IndexWriter.addDocuments(
        at edu.unh.cs.TrecCarBuildLuceneIndex.main(

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message