lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition
Date Tue, 04 Dec 2007 10:32:43 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-1072:
---------------------------------------

    Attachment: LUCENE-1072.take2.patch

OK, I added that as a test case (to TestIndexWriter), and then fixed
it.  Attached patch.  I plan to commit in 1 or 2 days.  Thanks
Michael!

This was happening during DW.abort(), which was being called on an
unhandled exception to clear all documents added since the last flush.
It was incorrectly recycling a null Posting instance.

I've also tightened when abort() is called to only those places that
actually require it.  A failure in the tokenization of one document
should not discard previously indexed documents but not-yet-flushed
documents.  So I added asserts to the test case to verify that.


> NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1072
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1072
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>         Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java HotSpot(TM)
64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using lucene-core-2007-11-29_02-49-31
>            Reporter: Alexei Dets
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1072.patch, LUCENE-1072.take2.patch
>
>
> In my case during indexing sometimes appear documents with unusually large "words" -
text-encoded images in fact.
> Attempt to add document that contains field with such token produces java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: term length 37944 exceeds max term length 16383
>         at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
>         at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
>         at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
>         at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
>         at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is expected, exception is caught and ignored. The problem is that after this IndexWriter
becomes somewhat corrupted and subsequent attempts to add documents to the index fail as well,
this time with NPE:
> java.lang.NullPointerException
>         at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
>         at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
>         at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
>         at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
>         at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is 100% reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message