lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Created: (LUCENE-2789) Let codec decide to use compound file system or not
Date Thu, 02 Dec 2010 11:28:11 GMT
Let codec decide to use compound file system or not

                 Key: LUCENE-2789
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Codecs, Index
            Reporter: Simon Willnauer

While working on LUCENE-2186  and in the context of recent [mails |]
about consolidating MergePolicy and LogMergePolicy I wanna propose a rather big change how
Compund Files are created / handled in IW. Since Codecs have been introduced we have several
somewhat different way of how data is written to the index. Sep codec for instance writes
different files for index data and DocValues will write one file per field and segment. Eventually
codecs need to have more control over how files are written ie. if CFS should be used or not
is IMO really  a matter of the codec used for writing.

On the other hand when you look at IW internals CFS really pollutes the indexing code and
relies on information from inside a codec (see SegmentWriteState.flusedFiles) actuall this
differentiation spreads across many classes related to indexing including the LogMergePolicy.
IMO how new flushed segments are written has nothing to do with MP in the first place and
MP currently choses whether a newly flushed segment is CFS or not (correct me if I am wrong),
pushing all this logic down to codecs would make lots of code much easier and cleaner.

As mike said this would also reduce the API footprint if we make it private to the codec.
I can imagine some situations where you really want control over certain fields to be stored
as non-CFS and other to be stored as CFS.  Codecs might need more information about other
segments during a merge to decide if or not to use CFS based on the segments size but we can
easily change that API. From a reading point of view we already have Codec#files that can
decide case by case what files belong to this codec.

let me know the thoughts

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message