lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Created] (LUCENE-4127) negative offsets/deltas corrumption
Date Sun, 10 Jun 2012 12:23:42 GMT
Robert Muir created LUCENE-4127:

             Summary: negative offsets/deltas corrumption
                 Key: LUCENE-4127
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.0
            Reporter: Robert Muir
         Attachments: LUCENE-4127_test.patch

If offsets go negative or backwards, it can corrupt the index with DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS:
the offsets will have wrong values (different from the term vectors) or even crazy values
like -2147483645

The problem with this is that its not just theoretical: its too easy to do this with lucene's
own analyzer chains (e.g. ngramtokenizer).
See issues such as LUCENE-3920 and some discussion on LUCENE-3738

The question is how to fix this, e.g. should we:
# start enforcing that offsets cannot be crazy values in OffsetAttributeImpl/IndexWriter and
fix the broken analyzers
# leave offsets as a pair of opaque integers, declaring this a limitation of the current codec,
and either workaround or throw UOE from the postings writer.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message