ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Thu, 15 Mar 2018 23:23:52 GMT
Igniters and especially Native Persistence experts,

We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in 
2.4 release. That was difficult decision: we sacrificed power loss / OS 
crash tolerance, but gained significant performance boost. From my 
perspective, LOG_ONLY is right choice, but it still misses some critical 
features that default mode should have.

Let's focus on exact guarantees each mode provides. Documentation 
explains it in pretty simple manner: LOG_ONLY - writes survive process 
crash, FSYNC - writes survive power loss scenarios. I have to notice 
that documentation doesn't describe what exactly can happen to node in 
LOG_ONLY mode in case of power loss / OS crash scenario. Basically, 
there are two possible negative outcomes: loss of several last updates 
(it's exactly what can happen in BACKGROUND mode in case of process 
crash) and total storage corruption (not only last updates, but all data 
will be lost). I've made a quick research on this and came into 
conclusion that power loss in LOG_ONLY can lead to storage corruption. 
There are several explanations for this:
1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't 
perform actual fsync unless current WAL mode is FSYNC. We call this 
method when we write checkpoint marker to WAL. As long as part of WAL 
before checkpoint marker can be not synced, "physical" records that are 
necessary for crash recovery in "Node stopped in the middle of 
checkpoint" scenario may be corrupted after power loss. If that happens, 
we won't be able to recover internal data structures, which means loss 
of all data.
2) We don't fsync WAL archive files unless current WAL mode is FSYNC. 
WAL archive can contain necessary "physical" records as well, which 
leads us to the case described above.
3) We do perform fsync on rollover (switch of current WAL segment) in 
all modes, but only when there's enough space to write switch segment 
record - see FileWriteHandle#close. So there's a little chance that 
we'll skip fsync and bump into the same case.

Enforcing fsync on that three situations will give us a guarantee that 
LOG_ONLY will survive power loss scenarios with possibility of losing 
several last updates. There still can be a total binary mess in the last 
part of WAL, but as long as we perform CRC check during WAL replay, 
we'll detect start of that mess. Extra fsyncs may cause slight 
performance degradation - all writes will have to await for one fsync on 
every rollover and checkpoint. It's still much faster than fsync on 
every write in WAL - I expect a few percent (0-5%) drop comparing to 
current LOG_ONLY. But degradation is degradation, and LOG_ONLY mode 
without extra fsyncs makes sense as well - that's why we need to 
introduce "LOG_ONLY + extra fsyncs" as separate WAL mode. I think, we 
should make it default - it provides significant durability bonus for 
the cost of one extra fsync for each WAL segment written.

To sum it up, I propose a new set of possible WAL modes:
NONE - both process crash and power loss can lead to corruption
BACKGROUND - process crash can lead to last updates loss, power loss can 
lead to corruption
LOG_ONLY - writes survive process crash, power loss can lead to corruption
LOG_ONLY_SAFE (default) - writes survive process crash, power loss can 
lead to last updates loss
FSYNC - writes survive both process crash and power loss

Thoughts?


Best Regards,
Ivan Rakov


Mime
View raw message