lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource
Date Tue, 06 Nov 2012 13:52:12 GMT


Robert Muir commented on LUCENE-4539:

I agree with you its bogus how it writes its header.

But I see a downside (I hope we can come up with an idea to deal with it rather than keeping
the header!)

One advantage of PackedInts writing its versioning (like FSTs) is that lots of things nest
them in their own file.

The problem with these two things is that they are themselves changing and versioned: they
arent like readVint()
which is pretty much fixed in what it does.

So having them write their own versions etc today to some extent makes back compat management
of file formats easier:
today its just DocValues and Term dictionaries using these things, tomorrow (4.1) its also
the postings lists: documents,
frequencies, and positions, and maybe in the future even stored fields (LUCENE-4527). 

Who is keeping up with all the places that must be managed when a packed ints version change
needs to happen? Today 
the header encapsulates in one place: if i backwards break FSTs and it breaks a few suggester
impls, i know anyone
using those suggesters will get IndexFormatTooOldException without me doing anything. So thats
very convenient.

> DocValues impls should read all headers up-front instead of per-directsource
> ----------------------------------------------------------------------------
>                 Key: LUCENE-4539
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Robert Muir
>         Attachments: LUCENE-4539.patch
> Currently, when DocValues opens, it just opens files. it doesnt read codec headers etc.
> Instead we read these every single time a directsource opens. 
> I think it should work like PostingsReaders: e.g. the PackedInts impl would read its
versioning info and codec headers and creating a new Direct impl should be a IndexInput.clone()
+ getDirectReaderNoHeader().
> Today its much more costly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message