lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: merge policy vs commit rates
Date Tue, 01 Aug 2017 17:39:58 GMT
IIUC, segments are actually written the first time when the
ramBufferSizeMB is exceeded. If you can afford it you might increase
that number. NOTE: I'm going from memory here so you should check....

That doesn't really address merging segments with deleted docs though.
I do wonder what happens if you bump the segments per tier. My guess:
less frequent but more intense merges so what the overall effect is is
unclear.

Best,
Erick

On Tue, Aug 1, 2017 at 8:00 AM, Walter Underwood <wunder@wunderwood.org> wrote:
> Optimizing for frequent changes sounds like a caching strategy, maybe “LRU
> merging”. Perhaps prefer merging segments that have not changed in a while?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Aug 1, 2017, at 5:50 AM, Tommaso Teofili <tommaso.teofili@gmail.com>
> wrote:
>
>
>
> Il giorno mar 1 ago 2017 alle ore 14:04 Adrien Grand <jpountz@gmail.com> ha
> scritto:
>>
>> The trade-off does not sound simple to me. This approach could lead to
>> having more segments overall, making search requests and updates potentially
>> slower and more I/O-intensive since they have to iterate over more segments?
>> I'm not saying this is a bad idea, but it could have unexpected
>> side-effects.
>
>
> yes, that's my same concern.
>
>>
>>
>> Do you actually have a high commit rate or a high reopen rate
>> (DirectoryReader.open(IndexWriter))?
>
>
> in my scenario both, but commit rate is much superseding reopening.
>
>>
>> Maybe reopening instead of committing (and still committing, but less
>> frequently) would decrease the I/O load since NRT segments might never need
>> to be actually written to disk if they are merged before the next commit
>> happens and you give enough memory to the filesystem cache.
>
>
> makes sense in general, however I am a bit constrained in how much I can
> avoid committing (states from an MVCC systems are tight to commits, so it's
> trickier).
>
> In general I was wondering if we could have the merge policy look at both
> commit rate and no. of segments and decide whether to merge or not based on
> both, so that if the segments growth is within a threshold we possibly save
> some merges when we have high commit rates, but as you say we may have to do
> bigger merges then.
> I can imagine this to make more sense when a lot of tiny changes are made to
> the index rather than a few big ones (then the bigger merges problem should
> be less significant).
>
> Other than my specific scenario, I am thinking that we can look again at the
> current MP algorithm and see if we can improve it, or make it more flexible
> to the way the "sneaky opponent" (Mike's ™ [1]) behaves.
>
> My 2 cents,
> Tommaso
>
> [1] :
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
>>
>>
>> Le mar. 1 août 2017 à 10:59, Tommaso Teofili <tommaso.teofili@gmail.com>
a
>> écrit :
>>>
>>> Hi all,
>>>
>>> lately I am looking a bit closer at merge policies, of course
>>> particularly at the tiered one, and I was wondering if we can mitigate the
>>> amount of possibly avoidable merges in high commit rates scenarios,
>>> especially when a high percentage of the commits happens on same docs.
>>> I've observed several evolutions of merges in such scenarios and it
>>> seemed to me the merge policy was too aggressive in merging, causing a large
>>> IO overhead.
>>> I've then tried the same with a merge policy which was tentatively
>>> looking at commit rates and skipping merges if such a rate is higher than a
>>> threshold which seemed to give slightly better results in reducing the
>>> unneeded IO caused by avoidable merges.
>>>
>>> I know this is a bit abstract but I would like to know if anyone has any
>>> ideas or plans about mitigating the merge overhead in general and / or in
>>> similar cases.
>>>
>>> Regards,
>>> Tommaso
>>>
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message