lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li" <>
Subject Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Tue, 05 Sep 2006 22:28:35 GMT
> What about an invariant that says the number of main index segments
> with the same level (f(n)) should be less than M.

That is exactly what the second property says:
"Less than M number of segments whose doc count n satisfies B*(M^c) <=
n < B*(M^(c+1)) for any c >= 0."

In other words, less than M number of segments with the same f(n).

> I am concerned about corner cases causing tons of segments and slowing
> search or causing errors due to file descriptor exhaustion.
> When merging, maybe we should count the number of segments at a
> particular index level f(n), rather than adding up the number of
> documents.  In the presence of deletions, this should lead to faster
> indexing (due to less frequent merges) I think.

Given M, B and an index which has L (0 < L < M) segments with docs
less than B, how many ram docs should be accumulated before a merge is
triggered? B is not good. B-sum(L) is the old strategy which has
problems. So between B-sum(L) and B? Once there are M segments with
docs less than B, they'll be merged. But what if L=0? Should B ram
docs be accumulated before flushed in that case?

In any case, if flushing ram docs causes the the number of segments
with <B docs to reach M in close(), a merge with those segments should
be triggered.

> What is the behavior of your patch under the current scenario:
> M=10, B=1000
> open writer, add 3 docs, close writer
> open writer, add 1000 docs, close writer
> Do you avoid the situation of having segments with docs=3 and 1000
> (hence f(n) increases as you increase segment numbers... a no-no)?

Currently, it does result in segments with docs=3 and 1000. I'll
modify the patch so that it completely complies with all the index
invariants once an agreement is reached.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message