lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: merge policy vs commit rates
Date Tue, 01 Aug 2017 15:00:32 GMT
Optimizing for frequent changes sounds like a caching strategy, maybe “LRU merging”. Perhaps
prefer merging segments that have not changed in a while?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 1, 2017, at 5:50 AM, Tommaso Teofili <tommaso.teofili@gmail.com> wrote:
> 
> 
> 
> Il giorno mar 1 ago 2017 alle ore 14:04 Adrien Grand <jpountz@gmail.com <mailto:jpountz@gmail.com>>
ha scritto:
> The trade-off does not sound simple to me. This approach could lead to having more segments
overall, making search requests and updates potentially slower and more I/O-intensive since
they have to iterate over more segments? I'm not saying this is a bad idea, but it could have
unexpected side-effects.
> 
> yes, that's my same concern.
>  
> 
> Do you actually have a high commit rate or a high reopen rate (DirectoryReader.open(IndexWriter))?
> 
> in my scenario both, but commit rate is much superseding reopening. 
>  
> Maybe reopening instead of committing (and still committing, but less frequently) would
decrease the I/O load since NRT segments might never need to be actually written to disk if
they are merged before the next commit happens and you give enough memory to the filesystem
cache.
> 
> makes sense in general, however I am a bit constrained in how much I can avoid committing
(states from an MVCC systems are tight to commits, so it's trickier).
> 
> In general I was wondering if we could have the merge policy look at both commit rate
and no. of segments and decide whether to merge or not based on both, so that if the segments
growth is within a threshold we possibly save some merges when we have high commit rates,
but as you say we may have to do bigger merges then. 
> I can imagine this to make more sense when a lot of tiny changes are made to the index
rather than a few big ones (then the bigger merges problem should be less significant).
> 
> Other than my specific scenario, I am thinking that we can look again at the current
MP algorithm and see if we can improve it, or make it more flexible to the way the "sneaky
opponent" (Mike's ™ [1]) behaves.
> 
> My 2 cents,
> Tommaso
> 
> [1] : http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
<http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
>  
> 
> Le mar. 1 août 2017 à 10:59, Tommaso Teofili <tommaso.teofili@gmail.com <mailto:tommaso.teofili@gmail.com>>
a écrit :
> Hi all,
> 
> lately I am looking a bit closer at merge policies, of course particularly at the tiered
one, and I was wondering if we can mitigate the amount of possibly avoidable merges in high
commit rates scenarios, especially when a high percentage of the commits happens on same docs.
> I've observed several evolutions of merges in such scenarios and it seemed to me the
merge policy was too aggressive in merging, causing a large IO overhead.
> I've then tried the same with a merge policy which was tentatively looking at commit
rates and skipping merges if such a rate is higher than a threshold which seemed to give slightly
better results in reducing the unneeded IO caused by avoidable merges.
> 
> I know this is a bit abstract but I would like to know if anyone has any ideas or plans
about mitigating the merge overhead in general and / or in similar cases.
> 
> Regards,
> Tommaso
> 
> 
> 


Mime
View raw message