lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: merge policy vs commit rates
Date Tue, 01 Aug 2017 12:50:08 GMT
Il giorno mar 1 ago 2017 alle ore 14:04 Adrien Grand <jpountz@gmail.com> ha
scritto:

> The trade-off does not sound simple to me. This approach could lead to
> having more segments overall, making search requests and updates
> potentially slower and more I/O-intensive since they have to iterate over
> more segments? I'm not saying this is a bad idea, but it could have
> unexpected side-effects.
>

yes, that's my same concern.


>
> Do you actually have a high commit rate or a high reopen rate
> (DirectoryReader.open(IndexWriter))?
>

in my scenario both, but commit rate is much superseding reopening.


> Maybe reopening instead of committing (and still committing, but less
> frequently) would decrease the I/O load since NRT segments might never need
> to be actually written to disk if they are merged before the next commit
> happens and you give enough memory to the filesystem cache.
>

makes sense in general, however I am a bit constrained in how much I can
avoid committing (states from an MVCC systems are tight to commits, so it's
trickier).

In general I was wondering if we could have the merge policy look at both
commit rate and no. of segments and decide whether to merge or not based on
both, so that if the segments growth is within a threshold we possibly save
some merges when we have high commit rates, but as you say we may have to
do bigger merges then.
I can imagine this to make more sense when a lot of tiny changes are made
to the index rather than a few big ones (then the bigger merges problem
should be less significant).

Other than my specific scenario, I am thinking that we can look again at
the current MP algorithm and see if we can improve it, or make it more
flexible to the way the "sneaky opponent" (Mike's ™ [1]) behaves.

My 2 cents,
Tommaso

[1] :
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html


>
> Le mar. 1 août 2017 à 10:59, Tommaso Teofili <tommaso.teofili@gmail.com>
> a écrit :
>
>> Hi all,
>>
>> lately I am looking a bit closer at merge policies, of course
>> particularly at the tiered one, and I was wondering if we can mitigate the
>> amount of possibly avoidable merges in high commit rates scenarios,
>> especially when a high percentage of the commits happens on same docs.
>> I've observed several evolutions of merges in such scenarios and it
>> seemed to me the merge policy was too aggressive in merging, causing a
>> large IO overhead.
>> I've then tried the same with a merge policy which was tentatively
>> looking at commit rates and skipping merges if such a rate is higher than a
>> threshold which seemed to give slightly better results in reducing the
>> unneeded IO caused by avoidable merges.
>>
>> I know this is a bit abstract but I would like to know if anyone has any
>> ideas or plans about mitigating the merge overhead in general and / or in
>> similar cases.
>>
>> Regards,
>> Tommaso
>>
>>
>>
>>

Mime
View raw message