lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7976) Add a parameter to TieredMergePolicy to merge segments that have more than X percent deleted documents
Date Fri, 20 Oct 2017 20:24:01 GMT


Shawn Heisey commented on LUCENE-7976:

Very interesting discussion and problem.

If we ignore for a moment what TMP actually does, and back up to the design intent when the
policy was made ... what would the designer have wanted to happen in the case of a segment
that's considerably larger than the configured max size?  Took me a while to find the right
issue, which is LUCENE-854, work by [~mikemccand].

I suspect that the current behavior, where a segment that's 20 times larger than the configured
max segment size is ineligible for automatic merging until 97.5 percent deleted docs, was
not actually what was desired.  Indexes with a segment like might not have even been considered
when TMP was new.  I don't see anything in LUCENE-854 that mentions it.  I haven't checked
all the later issues where changes to TMP were made.

So, how do we deal with this problem?  I see three options.  We can design an entirely new
policy, and if its behavior becomes preferred, consider changing the default at a later date.
 We can change TMP so it behaves better with very large segments with no change in user code
or config.  We can add Erick's suggested option.  For any of these options, improved documentation
is a must.

The second option (and the latter half of the first option) carries one risk factor I can
think of -- users complaining about new behavior in a similar manner to what I've heard about
when the default directory was changed to MMAP.

> Add a parameter to TieredMergePolicy to merge segments that have more than X percent
deleted documents
> ------------------------------------------------------------------------------------------------------
>                 Key: LUCENE-7976
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
> We're seeing situations "in the wild" where there are very large indexes (on disk) handled
quite easily in a single Lucene index. This is particularly true as features like docValues
move data into MMapDirectory space. The current TMP algorithm allows on the order of 50% deleted
documents as per a dev list conversation with Mike McCandless (and his blog here:
> Especially in the current era of very large indexes in aggregate, (think many TB) solutions
like "you need to distribute your collection over more shards" become very costly. Additionally,
the tempting "optimize" button exacerbates the issue since once you form, say, a 100G segment
(by optimizing/forceMerging) it is not eligible for merging until 97.5G of the docs in it
are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like <maxAllowedPctDeletedInBigSegments>
(no, that's not serious name, suggestions welcome) which would default to 100 (or the same
behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 5G, the following
would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO MATTER
HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with smaller
segments to bring the resulting segment up to 5G. If no smaller segments exist, it would just
be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize).
It would be rewritten into a single segment removing all deleted docs no matter how big it
is to start. The 100G example above would be rewritten to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the default would
be the same behavior we see now. As it stands now, though, there's no way to recover from
an optimize/forceMerge except to re-index from scratch. We routinely see 200G-300G Lucene
indexes at this point "in the wild" with 10s of  shards replicated 3 or more times. And that
doesn't even include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A new merge
policy is certainly an alternative.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message