lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Updated] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers
Date Sun, 11 May 2014 08:39:15 GMT


Shai Erera updated LUCENE-5618:

    Attachment: LUCENE-5618.patch

Patch addresses the following:

* Modifies Lucene45/42DocValuesProducer to assert that all encoded fields exist in the FieldInfos.

* Simplifies ReaderAndUpdates.writeFieldUpdates readability by breaking out the updates to
separate methods.

* Each DocValues field's updates are written to separate files.

* Adds SegmentCommitInfo.docValuesGen, separate from fieldInfosGen.

* Fixes LUCENE-5636 by tracking per-field updates files, as well as fieldInfos files.
** per-generation update files are kept as deprecated, needed for 4.6-4.8 indexes back-compat.
These become empty after the segment is merged.

* Improved {{testDeleteUnusedUpdatesFiles}} to cover two fields' updates (this exposes the
bug on LUCENE-5636).

In terms of backwards compatibility, indexes between 4.6-4.8 will continue to reference unneeded
files until the segment is merged. This is impossible to fix without breaking back-compat
or introduce weird hacks which assume the default codec. This is not terrible though, since
the number of unneeded-but-referenced files is limited by the number of DV fields the app
has updated.

I'd appreciate a review on this. Before I commit it though, I want to take care of LUCENE-5619,
so we're sure the back-compat logic in this patch indeed works.

> DocValues updates send wrong fieldinfos to codec producers
> ----------------------------------------------------------
>                 Key: LUCENE-5618
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.9
>         Attachments: LUCENE-5618.patch
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't contain the
correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write "batches" of fields
in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message