ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ткаленко кирилл <tkalkir...@yandex.ru>
Subject Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance
Date Tue, 04 May 2021 08:29:37 GMT
Hello everybody!

At the moment, if there are partitions for the rebalance for which the historical rebalance
will be used, then we reserve segments in the WAL archive (we do not allow cleaning the WAL
archive) until the rebalance for all cache groups is over.

If a cluster is under load during the rebalance, WAL archive size may significantly exceed
limits set in DataStorageConfiguration#getMaxWalArchiveSize until the process is complete.
This may lead to user issues and nodes may crash with the "No space left on device" error.

We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by default 0.5, which
sets the threshold (multiplied by getMaxWalArchiveSize) from which and up to which the WAL
archive will be cleared, i.e. sets the size of the WAL archive that will always be on the
node. I propose to replace this system property with the  DataStorageConfiguration#getWalArchiveSize
in bytes, the default is (getMaxWalArchiveSize * 0.5) as it is now.

Main proposal:
When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel and do not give the
reservation of the WAL segments until we reach DataStorageConfiguration#getWalArchiveSize.
In this case, if there is no segment for historical rebalance, we will automatically switch
to full rebalance.

Mime
View raw message