lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Re: MergePolicy Thresholds
Date Mon, 02 May 2011 13:42:25 GMT
Have you checked BalancedSegmentMergePolicy? It has some more knobs :)

On Mon, May 2, 2011 at 17:03, Shai Erera <> wrote:
> Hi
> Today, LogMP allows you to set different thresholds for segments sizes,
> thereby allowing you to control the largest segment that will be
> considered for merge + the largest segment your index will hold (=~
> threshold * mergeFactor).
> So, if you want to end up w/ say 20GB segments, you can set
> maxMergeMB(ForOptimize) to 2GB and mergeFactor=10.
> However, this often does not achieve your desired goal -- if the index
> contains 5 and 7 GB segments, they will never be merged b/c they are
> bigger than the threshold. I am willing to spend the CPU and IO resources
> to end up w/ 20 GB segments, whether I'm merging 10 segments together or
> only 2. After I reach a 20GB segment, it can rest peacefully, at least
> until I increase the threshold.
> So I wonder, first, if this threshold (i.e., largest segment size you
> would like to end up with) is more natural to set than thee current
> thresholds,
> from the application level? I.e., wouldn't it be a simpler threshold to set
> instead of doing weird calculus that depend on maxMergeMB(ForOptimize)
> and mergeFactor?
> Second, should this be an addition to LogMP, or a different
> type of MP. One that adheres to only those two factors (perhaps the
> segSize threshold should be allowed to set differently for optimize and
> regular merges). It can pick segments for merge such that it maximizes
> the result segment size (i.e., don't necessarily merge in sequential
> order), but not more than mergeFactor.
> I guess, if we think that maxResultSegmentSizeMB is more intuitive than
> the current thresholds, application-wise, then this change should go
> into LogMP. Otherwise, it feels like a different MP is needed, because
> LogMP is already complicated and another threshold would confuse things.
> What do you think of this? Am I trying to optimize too much? :)
> Shai

Kirill Zakharenko/Кирилл Захаренко
Phone: +7 (495) 683-567-4
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message