lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Tue, 07 Aug 2007 14:41:59 GMT


Michael McCandless commented on LUCENE-847:

This looks great Steve!

More specific feeedback soon, but ... in thinking about concurrency
(and from reading your comments about it in LogDocMergePolicy), I
think we ideally would like concurrency to be fully independent of the
merge policy.

Ie, just like you can take any shell command and choose to run it in
the background by sticking an "&" on the end, I should be able to take
my favorite MergePolicy instance X and "wrap" it inside a "concurrent
merge policy wrapper".  Eg, instantiate ConcurrentMergePolicy(X), and
then ConcurrentMergePolicy would take the merges requested by X and do
them in the background.

I think with one change to your MergePolicy API & control flow, we
could make this work very well: instead of requiring the MergePolicy
to call IndexWriter.merge, and do the cascading, it should just return
the one MergeSpecification that should be done right now.  This would
mean the "MergePolicy.merge" method would return null if no merge is
necessary right now, and would return a MergeSpecification if a merge
is required.

This way, it is IndexWriter that would execute the merge.  When the
merge is done, IndexWriter would then call the MergePolicy again to
give it a chance to "cascade".  This simplifies the locking because
IndexWriter can synchronize on SegmentInfos when it calls
"MergePolicy.merge" and so MergePolicy no longer has to deal with this
complexity (that SegmentInfos change during merge).

Then, with this change, we could make a ConcurrentMergePolicy that
could (I think) easily "wrap" itself around another MergePolicy X,
intercepting the calls to "merge".  When the inner MergePolicy wants
to do a merge, the ConcurrentMergePolicy would in turn kick off that
merge in the BG but then return null to the IndexWriter allowing
IndexWriter to return to its caller, etc.

Then, this also simplifies MergePolicy implementations because you no
longer have to deal w/ thread safety issues around driving your own
merges, cascading merges, dealing with sneaky SegmentInfos changing
while doing the merge, etc....

> Factor merge policy out of IndexWriter
> --------------------------------------
>                 Key: LUCENE-847
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.txt
> If we factor the merge policy out of IndexWriter, we can make it pluggable, making it
possible for apps to choose a custom merge policy and for easier experimenting with merge
policy variants.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message