ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislav Lukyanov <stanlukya...@gmail.com>
Subject Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance
Date Thu, 13 May 2021 15:37:37 GMT
What I mean by degradation when archive size < min is that, for example, historical rebalance
is available for a smaller timespan than expected by the system design.
It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong
word we can call it "non-steady state" :) 
In any case, I think we're on the same page.


> On 11 May 2021, at 13:18, Andrey Gura <agura@apache.org> wrote:
> 
> Stan
> 
>> If archive size is less than min or more than max then the system functionality can
degrade (e.g. historical rebalance may not work as expected).
> 
> Why does the condition "archive size is less than min" lead to system
> degradation? Actually, the described case is a normal situation for
> brand new clusters.
> 
> I'm okay with the proposed minWalArchiveSize property. Looks like
> relatively understandable property.
> 
> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
> <stanlukyanov@gmail.com> wrote:
>> 
>> Discuss this with Kirill verbally.
>> 
>> Kirill showed me that having the min threshold doesn't quite work.
>> It doesn't work because we no longer know how much WAL we should remove if we reach
getMaxWalArchiveSize.
>> 
>> For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>> Now, say we're doing historical rebalance and reserve the WAL archive.
>> The WAL archive starts growing and soon it occupies 2 GB.
>> Now what?
>> We're supposed to give up WAL reservations and start agressively removing WAL archive.
>> But it is not clear when can we stop removing WAL archive - since last 2 hours of
WAL are larger than our maxWalArchiveSize
>> there is no meaningful point the system can use as a "minimum" WAL size.
>> 
>> I understand the description above is a bit messy but I believe that whoever is interested
in this will understand it
>> after drawing this on paper.
>> 
>> 
>> I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>> 
>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>> with the behavior as initially described by Kirill.
>> 
>> Stan
>> 
>> 
>>> On 7 May 2021, at 15:09, ткаленко кирилл <tkalkirill@yandex.ru>
wrote:
>>> 
>>> Stas hello!
>>> 
>>> I didn't quite get your last idea.
>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment
until minWalArchiveTimespan?
>>> 
>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukyanov@gmail.com>:
>>>> An interesting suggestion I heard today.
>>>> 
>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan -
i.e. be a number of seconds instead of a number of bytes!
>>>> 
>>>> I think this makes perfect sense from the user point of view.
>>>> "I want to have WAL archive for at least N hours but I have a limit of M
gigabytes to store it".
>>>> 
>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>> Perhaps we can actually implement this?
>>>> 
>>>> Thanks,
>>>> Stan
>>>> 
>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukyanov@gmail.com>
wrote:
>>>>> 
>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>> +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>> 
>>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing
(is it the current size? the minimal size? the target size?)
>>>>> I suggest to name the property geMintWalArchiveSize. I think that this
is exactly what it is - the minimal size of the archive that we want to have.
>>>>> The archive size at all times should be between min and max.
>>>>> If archive size is less than min or more than max then the system functionality
can degrade (e.g. historical rebalance may not work as expected).
>>>>> I think these rules are intuitively understood from the "min" and "max"
names.
>>>>> 
>>>>> Ilya's suggestion about throttling is great although I'd do this in a
different ticket.
>>>>> 
>>>>> Thanks,
>>>>> Stan
>>>>> 
>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmuzaf@apache.org>
wrote:
>>>>>> 
>>>>>> Hello, Kirill
>>>>>> 
>>>>>> +1 for this change, however, there are too many configuration settings
>>>>>> that exist for the user to configure Ignite cluster. It is better
to
>>>>>> keep the options that we already have and fix the behaviour of the
>>>>>> rebalance process as you suggested.
>>>>>> 
>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkirill@yandex.ru>
wrote:
>>>>>>> Hi Ilya!
>>>>>>> 
>>>>>>> Then we can greatly reduce the user load on the cluster until
the rebalance is over. Which can be critical for the user.
>>>>>>> 
>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnacheev@gmail.com>:
>>>>>>>> Hello!
>>>>>>>> 
>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint
based
>>>>>>>> write throttling?
>>>>>>>> 
>>>>>>>> So we will be throttling for both checkpoint page buffer
and WAL limit.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>> 
>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл
<tkalkirill@yandex.ru>:
>>>>>>>> 
>>>>>>>>> Hello everybody!
>>>>>>>>> 
>>>>>>>>> At the moment, if there are partitions for the rebalance
for which the
>>>>>>>>> historical rebalance will be used, then we reserve segments
in the WAL
>>>>>>>>> archive (we do not allow cleaning the WAL archive) until
the rebalance for
>>>>>>>>> all cache groups is over.
>>>>>>>>> 
>>>>>>>>> If a cluster is under load during the rebalance, WAL
archive size may
>>>>>>>>> significantly exceed limits set in
>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the
process is
>>>>>>>>> complete. This may lead to user issues and nodes may
crash with the "No
>>>>>>>>> space left on device" error.
>>>>>>>>> 
>>>>>>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
by
>>>>>>>>> default 0.5, which sets the threshold (multiplied by
getMaxWalArchiveSize)
>>>>>>>>> from which and up to which the WAL archive will be cleared,
i.e. sets the
>>>>>>>>> size of the WAL archive that will always be on the node.
I propose to
>>>>>>>>> replace this system property with the
>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes,
the default is
>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>> 
>>>>>>>>> Main proposal:
>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize
is reached, cancel
>>>>>>>>> and do not give the reservation of the WAL segments until
we reach
>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case,
if there is no
>>>>>>>>> segment for historical rebalance, we will automatically
switch to full
>>>>>>>>> rebalance.
>> 


Mime
View raw message