ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Mon, 26 Mar 2018 08:50:10 GMT
Val,

There's no any sense to use WalMode.NONE in production environment, it's 
kept for testing and debugging purposes (including possible user 
activities like capacity planning).
We already print a warning at node start in case WalMode.NONE is set:

> U.quietAndWarn(log,"Started write-ahead log manager in NONE mode, persisted data may
be 
> lost in " +
>      "a case of unexpected node failure. Make sure to deactivate the 
> cluster before shutdown.");

Best Regards,
Ivan Rakov

On 24.03.2018 1:40, Valentin Kulichenko wrote:
> Dmitry,
>
> Thanks for clarification. So it sounds like if we fix all other modes as we
> discuss here, NONE would be the only one allowing corruption. I also don't
> see much sense in this and I think we should clearly state this in the doc,
> as well print out a warning if NONE mode is used. Eventually, if it's
> confirmed that there are no reasonable use cases for it, we can deprecate
> it.
>
> -Val
>
> On Fri, Mar 23, 2018 at 3:26 PM, Dmitry Pavlov <dpavlov.spb@gmail.com>
> wrote:
>
>> Hi Val,
>>
>> NONE means that the WAL log is disabled and not written at all. Use of the
>> mode is at your own risk. It is possible that restore state after the crash
>> at the middle of checkpoint will not succeed. I do not see much sence in
>> it, especially in production.
>>
>> BACKGROUND is full functional WAL mode, but allows some delay before flush
>> to disk.
>>
>> Sincerely,
>> Dmitriy Pavlov
>>
>> сб, 24 мар. 2018 г. в 1:07, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com>:
>>
>>> I agree. In my view, any possibility to get a corrupted storage is a bug
>>> which needs to be fixed.
>>>
>>> BTW, can someone explain semantics of NONE mode? What is the difference
>>> from BACKGROUND from user's perspective? Is there any particular use case
>>> where it can be used?
>>>
>>> -Val
>>>
>>> On Fri, Mar 23, 2018 at 2:49 AM, Dmitry Pavlov <dpavlov.spb@gmail.com>
>>> wrote:
>>>
>>>> Hi Ivan,
>>>>
>>>> IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree?
>>>>
>>>> Sincerely,
>>>> Dmitriy Pavlov
>>>>
>>>> пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glukos@gmail.com>:
>>>>
>>>>> Igniters, there's another important question about this matter.
>>>>> Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that
>> we
>>>>> have to do it: it will cause similar performance drop, but if we
>>>>> consider LOG_ONLY broken without these fixes, BACKGROUND is broken as
>>>> well.
>>>>> Best Regards,
>>>>> Ivan Rakov
>>>>>
>>>>> On 23.03.2018 10:27, Ivan Rakov wrote:
>>>>>> Fixes are quite simple.
>>>>>> I expect them to be merged in master in a week in worst case.
>>>>>>
>>>>>> Best Regards,
>>>>>> Ivan Rakov
>>>>>>
>>>>>> On 22.03.2018 17:49, Denis Magda wrote:
>>>>>>> Ivan,
>>>>>>>
>>>>>>> How quick are you going to merge the fix into the master? Many
>>>>>>> persistence
>>>>>>> related optimizations have already stacked up. Probably, we can
>>>> release
>>>>>>> them sooner if the community agrees.
>>>>>>>
>>>>>>> --
>>>>>>> Denis
>>>>>>>
>>>>>>> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <
>> ivan.glukos@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks all!
>>>>>>>> We seem to have reached a consensus on this issue. I'll just
add
>>>>>>>> necessary
>>>>>>>> fsyncs under IGNITE-7754.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Ivan Rakov
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22.03.2018 15:13, Ilya Lantukh wrote:
>>>>>>>>
>>>>>>>>> +1 for fixing LOG_ONLY. If current implementation doesn't
>> protect
>>>> from
>>>>>>>>> data
>>>>>>>>> corruption, it doesn't make sence.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <
>> dmagda@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> +1 for the fix of LOG_ONLY
>>>>>>>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk
<
>>>>>>>>>> alexey.goncharuk@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 for fixing LOG_ONLY to enforce corruption safety
given the
>>>>>>>>>> provided
>>>>>>>>>>> performance results.
>>>>>>>>>>>
>>>>>>>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <
>>> vozerov@gridgain.com
>>>>> :
>>>>>>>>>>> +1 for accepting drop in LOG_ONLY. 7% is not
that much and
>> not a
>>>>>>>>>>> drop
>>>>>>>>>>> at
>>>>>>>>>>> all, provided that we fixing a bug. I.e. should
we implement
>> it
>>>>>>>>>>> correctly
>>>>>>>>>>> in the first place we would never notice any
"drop".
>>>>>>>>>>>> I do not understand why someone would like
to use current
>>> broken
>>>>>>>>>>>> mode.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov
>>>>>>>>>>>> <dpavlov.spb@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi, I think option 1 is better. As Val said
any mode that
>>> allows
>>>>>>>>>>>> corruption
>>>>>>>>>>>>
>>>>>>>>>>>>> does not make much sense.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What Ivan mentioned here as drop, in
relation to old mode
>>>> DEFAULT
>>>>>>>>>>>> (FSYNC
>>>>>>>>>>>> now), is still significant perfromance boost.
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Dmitriy Pavlov
>>>>>>>>>>>>>
>>>>>>>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan
Rakov <
>>> ivan.glukos@gmail.com
>>>>> :
>>>>>>>>>>>>> I've attached benchmark results to the
JIRA ticket.
>>>>>>>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE
mode,
>> independent
>>>> of
>>>>>>>>>>>>> WAL
>>>>>>>>>>> compaction enabled flag. It's pretty significant
drop: WAL
>>>>>>>>>>>>> compaction
>>>>>>>>>>> itself gives only ~3% drop.
>>>>>>>>>>>>>> I see two options here:
>>>>>>>>>>>>>> 1) Change LOG_ONLY behavior. That
implies that we'll be
>> ready
>>>> to
>>>>>>>>>>>>> release
>>>>>>>>>>>>> AI 2.5 with 7% drop.
>>>>>>>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make
it default, add release
>> note
>>>>>>>>>>>>>> to AI
>>>>>>>>>>>>>>
>>>>>>>>>>>>> 2.5
>>>>>>>>>>>> that we added power loss durability in default
mode, but user
>>> may
>>>>>>>>>>>>>> fallback to previous LOG_ONLY in
order to retain
>> performance.
>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Val,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If a storage is in
>>>>>>>>>>>>>>>> corrupted state, does it
mean that it needs to be
>>> completely
>>>>>>>>>>>>>>> removed
>>>>>>>>>>>> and
>>>>>>>>>>>>>> cluster needs to be restarted without
data?
>>>>>>>>>>>>>>> Yes, there's a chance that in
LOG_ONLY all local data will
>>> be
>>>>>>>>>>>>>> lost,
>>>>>>>>>>> but only in *power loss**/ OS crash* case.
>>>>>>>>>>>>>>> kill -9, JVM crash, death of
critical system thread and
>> all
>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>> cases that usually take place
are variations of *process
>>>> crash*.
>>>>>>>>>>>>>> All
>>>>>>>>>>>> WAL modes (except NONE, of course) ensure
corruption-safety
>> in
>>>>>>>>>>>>>> case
>>>>>>>>>>> of
>>>>>>>>>>>>> process crash.
>>>>>>>>>>>>>>> If so, I'm not sure any mode
>>>>>>>>>>>>>>>> that allows corruption makes
much sense to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It depends on performance impact
of enforcing power-loss
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> corruption
>>>>>>>>>>> safety. Price of full protection from power loss
is high -
>> FSYNC
>>>>>>>>>>>>>> is
>>>>>>>>>>> way slower (2-10 times) than other WAL modes.
The question is
>>>>>>>>>>>>>> whether
>>>>>>>>>>>> ensuring weaker guarantees (corruption can't
happen, but loss
>>> of
>>>>>>>>>>>>>> last
>>>>>>>>>>>> updates can) will affect performance as badly
as strong
>>>>>>>>>>>>>> guarantees.
>>>>>>>>>>> I'll share benchmark results soon.
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 20.03.2018 5:09, Valentin
Kulichenko wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do we understand under
"data corruption" here? If a
>>>>>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is
>>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>>> corrupted state, does it mean that it
needs to be completely
>>>>>>>>>>>>>>> removed
>>>>>>>>>>>> and
>>>>>>>>>>>>>> cluster needs to be restarted without
data? If so, I'm not
>>> sure
>>>>>>>>>>>>>>> any
>>>>>>>>>>>> mode
>>>>>>>>>>>>>> that allows corruption makes much
sense to me. How am I
>>>> supposed
>>>>>>>>>>>>>>> to
>>>>>>>>>>>> use a
>>>>>>>>>>>>>>>> database, if virtually any
failure can end with complete
>>>>>>>>>>>>>>>> loss of
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> data?
>>>>>>>>>>>>> In any case, this definitely should not
be a default
>> behavior.
>>>>>>>>>>>>>>> If
>>>>>>>>>>> user ever
>>>>>>>>>>>>>>>> switches to corruption-unsafe
mode, there should be a
>> clear
>>>>>>>>>>>>>>> warning
>>>>>>>>>>>> about
>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06
AM, Ivan Rakov <
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ivan.glukos@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Ticket to track changes:
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 16.03.2018 10:58,
Dmitriy Setrakyan wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018
at 12:55 AM, Ivan Rakov <
>>>>>>>>>>>>>>>>> ivan.glukos@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Unlike BACKGROUND,
LOG_ONLY provides strict write
>>>> guarantees
>>>>>>>>>>>>>>>>>>> unless power
>>>>>>>>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>>>>>>>>> Seems like we
need to measure performance difference
>> to
>>>>>>>>>>>>>>>>>> decide
>>>>>>>>>>> whether do
>>>>>>>>>>>>>>>>>>> we need separate
WAL mode. If it will be invisible,
>>> we'll
>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>> fix
>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>> bugs without
introducing new mode; if it will be
>>>>>>>>>>>>>>>>>>> perceptible,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> we'll
>>>>>>>>>>>>> continue the discussion about introducing
LOG_ONLY_SAFE.
>>>>>>>>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, this sounds
like the right approach.
>>>>>>>>>>>>>>>>>>>
>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message