ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislav Lukyanov <stanlukya...@gmail.com>
Subject Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance
Date Thu, 06 May 2021 17:00:10 GMT
An interesting suggestion I heard today.

The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number
of seconds instead of a number of bytes!

I think this makes perfect sense from the user point of view.
"I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store

Do we have checkpoint timestamp stored anywhere? (cp start markers?)
Perhaps we can actually implement this?


> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukyanov@gmail.com> wrote:
> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
> +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
> I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current
size? the minimal size? the target size?)
> I suggest to name the property geMintWalArchiveSize. I think that this is exactly what
it is - the minimal size of the archive that we want to have.
> The archive size at all times should be between min and max.
> If archive size is less than min or more than max then the system functionality can degrade
(e.g. historical rebalance may not work as expected).
> I think these rules are intuitively understood from the "min" and "max" names.
> Ilya's suggestion about throttling is great although I'd do this in a different ticket.
> Thanks,
> Stan
>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmuzaf@apache.org> wrote:
>> Hello, Kirill
>> +1 for this change, however, there are too many configuration settings
>> that exist for the user to configure Ignite cluster. It is better to
>> keep the options that we already have and fix the behaviour of the
>> rebalance process as you suggested.
>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkirill@yandex.ru>
>>> Hi Ilya!
>>> Then we can greatly reduce the user load on the cluster until the rebalance is
over. Which can be critical for the user.
>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnacheev@gmail.com>:
>>>> Hello!
>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>> write throttling?
>>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkirill@yandex.ru>:
>>>>> Hello everybody!
>>>>> At the moment, if there are partitions for the rebalance for which the
>>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>>> archive (we do not allow cleaning the WAL archive) until the rebalance
>>>>> all cache groups is over.
>>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>>> significantly exceed limits set in
>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>> complete. This may lead to user issues and nodes may crash with the "No
>>>>> space left on device" error.
>>>>> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>> from which and up to which the WAL archive will be cleared, i.e. sets
>>>>> size of the WAL archive that will always be on the node. I propose to
>>>>> replace this system property with the
>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>> Main proposal:
>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>> and do not give the reservation of the WAL segments until we reach
>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is
>>>>> segment for historical rebalance, we will automatically switch to full
>>>>> rebalance.

View raw message