lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3126) IndexWriter.addIndexes can make any incoming segment into CFS if it isn't already
Date Mon, 23 May 2011 15:07:47 GMT


Shai Erera commented on LUCENE-3126:

Patch does not handle all files well (few tests fail). Apparently, the .del file should not
be rolled into the .cfs. SegmentMerger.createCompoundFile does this by default, however it's
only called from code that ensures no deletions exist. Would have been nice if this method
documented it :).

Also, I think *.s<num> should not be rolled into .cfs (those are the separate norms
files). I don't know how to create such files in the first place (thought they're of old format,
but 3.1 indexes have them also), and TestBackCompat fails. Is there a way to identify those
files? Is it safe to check if the file extension starts w/ IndexFileNames.SEPARATE_NORMS_EXTENSION?
Feels hacky to me.

Another thing, I think in order to avoid shared doc stores (and whatever other old-format)
stuff, since it's only an optimization, that the code should copy into CFS only if the segment
version is on or after 3.1 (that is StringHelper.getVersionComparator().compare(info.getVersion,
"3.1") >= 0).

I think I'm close to finish it, just need to figure out the separate norms thing.

> IndexWriter.addIndexes can make any incoming segment into CFS if it isn't already
> ---------------------------------------------------------------------------------
>                 Key: LUCENE-3126
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.2, 4.0
>         Attachments: LUCENE-3126.patch
> Today, IW.addIndexes(Directory) does not modify the CFS-mode of the incoming segments.
However, if IndexWriter's MP wants to create CFS (in general), there's no reason why not turn
the incoming non-CFS segments into CFS. We anyway copy them, and if MP is not against CFS,
we should create a CFS out of them.
> Will need to use CFW, not sure it's ready for that w/ current API (I'll need to check),
but luckily we're allowed to change it (@lucene.internal).
> This should be done, IMO, even if the incoming segment is large (i.e., passes MP.noCFSRatio)
b/c like I wrote above, we anyway copy it. However, if you think otherwise, speak up :).
> I'll take a look at this in the next few days.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message