lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2373) Create a Codec to work with streaming and append-only filesystems
Date Thu, 01 Jul 2010 09:42:49 GMT


Michael McCandless commented on LUCENE-2373:

This looks great Andrzej!  This gives codecs full control over reading/writing of SegmentInfo/s,
which now empowers a Codec to store any per-segment info it needs to (eg, hasProx, which is
now a hardwired bit in SegmentInfo, is really a codec level detail).  Probably the codec could
return a (private to it) subclass of SegmentInfo to hold such extra info...

Maybe we should provide default impls for CodecProvider.getSegmentInfosReader/Writer?  (Ie
returning the Default impls)

Also, should we factor out the "leave space for index pointer" (out.writeLong(0)) to the subclass?
 (And, the reading of that dirOffset).  Because this is wasted now for the appending codec...

> Create a Codec to work with streaming and append-only filesystems
> -----------------------------------------------------------------
>                 Key: LUCENE-2373
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Andrzej Bialecki 
>             Fix For: 4.0
>         Attachments: appending.patch
> Since early 2.x times Lucene used a skip/seek/write trick to patch the length of the
terms dict into a place near the start of the output data file. This however made it impossible
to use Lucene with append-only filesystems such as HDFS.
> In the post-flex trunk the following code in StandardTermsDictWriter initiates this:
> {code}
>     // Count indexed fields up front
>     CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); 
>     out.writeLong(0);                             // leave space for end index pointer
> {code}
> and completes this in close():
> {code}
>       out.writeLong(dirStart);
> {code}
> I propose to change this layout so that this pointer is stored simply at the end of the
file. It's always 8 bytes long, and we known the final length of the file from Directory,
so it's a single additional seek(length - 8) to read it, which is not much considering the

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message