lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Przemysław Szeremiota (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-7791) AIOOBE on flush+sort
Date Fri, 21 Apr 2017 06:48:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978174#comment-15978174
] 

Przemysław Szeremiota edited comment on LUCENE-7791 at 4/21/17 6:47 AM:
------------------------------------------------------------------------

[~jimczi]] What about {{SortingNumericIterator@NormValuesWriter}}? It throws too, and LUCENE-7791.patch
misses it's fix?


was (Author: przemosz):
[~jimczi]i] What about {{SortingNumericIterator@NormValuesWriter}}? It throws too, and LUCENE-7791.patch
misses it's fix?

> AIOOBE on flush+sort
> --------------------
>
>                 Key: LUCENE-7791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7791
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 6.5
>            Reporter: Przemysław Szeremiota
>              Labels: patch
>             Fix For: master (7.0), 6.6, 6.5.1
>
>         Attachments: LUCENE-7791.patch, sortflush.patch, sortflush-test.patch
>
>
> On released 6.5.0 version, flushing operation on sorted index throws ArrayIndexOutOfBoudException
in NumericDocValuesWriter, NormValuesWriter and BinaryDocValuesWriter.
> New SortedXXXIterators are looking up documents in FixedBitSets or PackedValues based
on remapped (sorted) document ID, without checking BitSets/Values ranges, which are based
on original document IDs. Meanwhile FixedBitSets can be sparse not only in between documents
with fields, but also after last (originally) document with given field (because writer's
addValue() is not called for last documents without values for fields). So remapped (sorted)
values range can have different useful values range and bounds checking should be done for
remapped and not original ID.
> We were hit by this bug because our indexes are built from independent sources by partial
updating fragments of documents, so there is always some documents without values in some
fields.
> As I understand this bug, it shows when:
> - maxDoc is greater than 64 (64 is pre-allocated size for writers FixedBitSets)
> - some number of last taken documents have empty fields (so FixedBitSet won't be reallocated
to maxDoc)
> Also, check for range of values for given field is now happening based on original ID
(e.g. "upto < size"), so flushing can now lost some values, even without hitting AIOOBE.
> I will attach patch resolving issues with some writers; for other writers from LUCENE-7579,
I am not sure if there are similar bugs in them; patch resolved our indexing issues, please
check changes from LUCENE-7579 for confirmation of lack of additional bugs in other flush-sorting
writers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message