ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ткаленко кирилл <tkalkir...@yandex.ru>
Subject Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance
Date Fri, 07 May 2021 12:09:10 GMT
Stas hello!

I didn't quite get your last idea. 
What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?

06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukyanov@gmail.com>:
> An interesting suggestion I heard today.
>
> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number
of seconds instead of a number of bytes!
>
> I think this makes perfect sense from the user point of view.
> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to
store it".
>
> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
> Perhaps we can actually implement this?
>
> Thanks,
> Stan
>
>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukyanov@gmail.com> wrote:
>>
>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>  +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>
>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the
current size? the minimal size? the target size?)
>>  I suggest to name the property geMintWalArchiveSize. I think that this is exactly
what it is - the minimal size of the archive that we want to have.
>>  The archive size at all times should be between min and max.
>>  If archive size is less than min or more than max then the system functionality
can degrade (e.g. historical rebalance may not work as expected).
>>  I think these rules are intuitively understood from the "min" and "max" names.
>>
>>  Ilya's suggestion about throttling is great although I'd do this in a different
ticket.
>>
>>  Thanks,
>>  Stan
>>
>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mmuzaf@apache.org> wrote:
>>>
>>>  Hello, Kirill
>>>
>>>  +1 for this change, however, there are too many configuration settings
>>>  that exist for the user to configure Ignite cluster. It is better to
>>>  keep the options that we already have and fix the behaviour of the
>>>  rebalance process as you suggested.
>>>
>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkirill@yandex.ru>
wrote:
>>>>  Hi Ilya!
>>>>
>>>>  Then we can greatly reduce the user load on the cluster until the rebalance
is over. Which can be critical for the user.
>>>>
>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnacheev@gmail.com>:
>>>>>  Hello!
>>>>>
>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint
based
>>>>>  write throttling?
>>>>>
>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>
>>>>>  Regards,
>>>>>  --
>>>>>  Ilya Kasnacheev
>>>>>
>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkirill@yandex.ru>:
>>>>>
>>>>>>  Hello everybody!
>>>>>>
>>>>>>  At the moment, if there are partitions for the rebalance for which
the
>>>>>>  historical rebalance will be used, then we reserve segments in
the WAL
>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance
for
>>>>>>  all cache groups is over.
>>>>>>
>>>>>>  If a cluster is under load during the rebalance, WAL archive size
may
>>>>>>  significantly exceed limits set in
>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process
is
>>>>>>  complete. This may lead to user issues and nodes may crash with
the "No
>>>>>>  space left on device" error.
>>>>>>
>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
by
>>>>>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>  from which and up to which the WAL archive will be cleared, i.e.
sets the
>>>>>>  size of the WAL archive that will always be on the node. I propose
to
>>>>>>  replace this system property with the
>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default
is
>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>
>>>>>>  Main proposal:
>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached,
cancel
>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there
is no
>>>>>>  segment for historical rebalance, we will automatically switch
to full
>>>>>>  rebalance.

Mime
View raw message