lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers
Date Tue, 22 Apr 2014 18:06:16 GMT


Shai Erera commented on LUCENE-5618:

bq. I think its a design flaw in how this stuff is written

Today we pass only the FIs the Codec should "care" about, rather than pass all the FIs it
"knows" about. This allows the Codec to optimize. E.g. today we don't take advantage of that,
and a DVP reads all the metadata of a field, even if the field isn't passed in the FIS (and
therefore will never be asked for).

If we write each field in its own gen, then since we don't allow adding new fields through
dvUpdates, for gen=-1 we just pass all known dvFieldInfos, and for gen > 0 we will pass
a single FI only, therefore the Codec always receives the FIs it knows about, even though
for gen=-1 it is given some FIs it shouldn't care about. Our Codecs only read metadata into
memory, the actual data is loaded lazily, so perhaps optimizing them is less important at
the moment.

I wish we could be more flexible though in our code. It feels odd to me that each field is
written in its own gen, just because we cannot add a FIS.exists() check in the Codec. Like,
if we always pass all DV FIS to every DVP, each will be able to do the exists() check, but
a DVP will see fields it doesn't know about. Is that bad? It still covers the corruption case
of a bad field number being encoded in the first place...

> DocValues updates send wrong fieldinfos to codec producers
> ----------------------------------------------------------
>                 Key: LUCENE-5618
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't contain the
correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write "batches" of fields
in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message