ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Kulichenko <valentin.kuliche...@gmail.com>
Subject Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Mon, 26 Mar 2018 20:45:23 GMT
Ivan,

It's all good then :) Thanks!

-Val

On Mon, Mar 26, 2018 at 1:50 AM, Ivan Rakov <ivan.glukos@gmail.com> wrote:

> Val,
>
> There's no any sense to use WalMode.NONE in production environment, it's
> kept for testing and debugging purposes (including possible user activities
> like capacity planning).
> We already print a warning at node start in case WalMode.NONE is set:
>
> U.quietAndWarn(log,"Started write-ahead log manager in NONE mode,
>> persisted data may be lost in " +
>>      "a case of unexpected node failure. Make sure to deactivate the
>> cluster before shutdown.");
>>
>
> Best Regards,
> Ivan Rakov
>
>
> On 24.03.2018 1:40, Valentin Kulichenko wrote:
>
>> Dmitry,
>>
>> Thanks for clarification. So it sounds like if we fix all other modes as
>> we
>> discuss here, NONE would be the only one allowing corruption. I also don't
>> see much sense in this and I think we should clearly state this in the
>> doc,
>> as well print out a warning if NONE mode is used. Eventually, if it's
>> confirmed that there are no reasonable use cases for it, we can deprecate
>> it.
>>
>> -Val
>>
>> On Fri, Mar 23, 2018 at 3:26 PM, Dmitry Pavlov <dpavlov.spb@gmail.com>
>> wrote:
>>
>> Hi Val,
>>>
>>> NONE means that the WAL log is disabled and not written at all. Use of
>>> the
>>> mode is at your own risk. It is possible that restore state after the
>>> crash
>>> at the middle of checkpoint will not succeed. I do not see much sence in
>>> it, especially in production.
>>>
>>> BACKGROUND is full functional WAL mode, but allows some delay before
>>> flush
>>> to disk.
>>>
>>> Sincerely,
>>> Dmitriy Pavlov
>>>
>>> сб, 24 мар. 2018 г. в 1:07, Valentin Kulichenko <
>>> valentin.kulichenko@gmail.com>:
>>>
>>> I agree. In my view, any possibility to get a corrupted storage is a bug
>>>> which needs to be fixed.
>>>>
>>>> BTW, can someone explain semantics of NONE mode? What is the difference
>>>> from BACKGROUND from user's perspective? Is there any particular use
>>>> case
>>>> where it can be used?
>>>>
>>>> -Val
>>>>
>>>> On Fri, Mar 23, 2018 at 2:49 AM, Dmitry Pavlov <dpavlov.spb@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Ivan,
>>>>>
>>>>> IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree?
>>>>>
>>>>> Sincerely,
>>>>> Dmitriy Pavlov
>>>>>
>>>>> пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glukos@gmail.com>:
>>>>>
>>>>> Igniters, there's another important question about this matter.
>>>>>> Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that
>>>>>>
>>>>> we
>>>
>>>> have to do it: it will cause similar performance drop, but if we
>>>>>> consider LOG_ONLY broken without these fixes, BACKGROUND is broken
as
>>>>>>
>>>>> well.
>>>>>
>>>>>> Best Regards,
>>>>>> Ivan Rakov
>>>>>>
>>>>>> On 23.03.2018 10:27, Ivan Rakov wrote:
>>>>>>
>>>>>>> Fixes are quite simple.
>>>>>>> I expect them to be merged in master in a week in worst case.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Ivan Rakov
>>>>>>>
>>>>>>> On 22.03.2018 17:49, Denis Magda wrote:
>>>>>>>
>>>>>>>> Ivan,
>>>>>>>>
>>>>>>>> How quick are you going to merge the fix into the master?
Many
>>>>>>>> persistence
>>>>>>>> related optimizations have already stacked up. Probably,
we can
>>>>>>>>
>>>>>>> release
>>>>>
>>>>>> them sooner if the community agrees.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Denis
>>>>>>>>
>>>>>>>> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <
>>>>>>>>
>>>>>>> ivan.glukos@gmail.com>
>>>
>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks all!
>>>>>>>>> We seem to have reached a consensus on this issue. I'll
just add
>>>>>>>>> necessary
>>>>>>>>> fsyncs under IGNITE-7754.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Ivan Rakov
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22.03.2018 15:13, Ilya Lantukh wrote:
>>>>>>>>>
>>>>>>>>> +1 for fixing LOG_ONLY. If current implementation doesn't
>>>>>>>>>>
>>>>>>>>> protect
>>>
>>>> from
>>>>>
>>>>>> data
>>>>>>>>>> corruption, it doesn't make sence.
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <
>>>>>>>>>>
>>>>>>>>> dmagda@apache.org>
>>>
>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 for the fix of LOG_ONLY
>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk
<
>>>>>>>>>>> alexey.goncharuk@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> +1 for fixing LOG_ONLY to enforce corruption
safety given the
>>>>>>>>>>> provided
>>>>>>>>>>>
>>>>>>>>>>>> performance results.
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov
<
>>>>>>>>>>>>
>>>>>>>>>>> vozerov@gridgain.com
>>>>
>>>>> :
>>>>>>
>>>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and
>>>>>>>>>>>>
>>>>>>>>>>> not a
>>>
>>>> drop
>>>>>>>>>>>> at
>>>>>>>>>>>> all, provided that we fixing a bug. I.e.
should we implement
>>>>>>>>>>>>
>>>>>>>>>>> it
>>>
>>>> correctly
>>>>>>>>>>>> in the first place we would never notice
any "drop".
>>>>>>>>>>>>
>>>>>>>>>>>>> I do not understand why someone would
like to use current
>>>>>>>>>>>>>
>>>>>>>>>>>> broken
>>>>
>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry
Pavlov
>>>>>>>>>>>>> <dpavlov.spb@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, I think option 1 is better. As Val
said any mode that
>>>>>>>>>>>>>
>>>>>>>>>>>> allows
>>>>
>>>>> corruption
>>>>>>>>>>>>>
>>>>>>>>>>>>> does not make much sense.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What Ivan mentioned here as drop,
in relation to old mode
>>>>>>>>>>>>>>
>>>>>>>>>>>>> DEFAULT
>>>>>
>>>>>> (FSYNC
>>>>>>>>>>>>> now), is still significant perfromance
boost.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Dmitriy Pavlov
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ср, 21 мар. 2018 г. в 17:56,
Ivan Rakov <
>>>>>>>>>>>>>>
>>>>>>>>>>>>> ivan.glukos@gmail.com
>>>>
>>>>> :
>>>>>>
>>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We observe ~7% drop in "fair"
LOG_ONLY_SAFE mode,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> independent
>>>
>>>> of
>>>>>
>>>>>> WAL
>>>>>>>>>>>>>>
>>>>>>>>>>>>> compaction enabled flag. It's pretty
significant drop: WAL
>>>>>>>>>>>>
>>>>>>>>>>>>> compaction
>>>>>>>>>>>>>>
>>>>>>>>>>>>> itself gives only ~3% drop.
>>>>>>>>>>>>
>>>>>>>>>>>>> I see two options here:
>>>>>>>>>>>>>>> 1) Change LOG_ONLY behavior.
That implies that we'll be
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ready
>>>
>>>> to
>>>>>
>>>>>> release
>>>>>>>>>>>>>> AI 2.5 with 7% drop.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make
it default, add release
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> note
>>>
>>>> to AI
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2.5
>>>>>>>>>>>>>>
>>>>>>>>>>>>> that we added power loss durability in
default mode, but user
>>>>>>>>>>>>>
>>>>>>>>>>>> may
>>>>
>>>>> fallback to previous LOG_ONLY in order to retain
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> performance.
>>>
>>>> Thoughts?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 20.03.2018 16:00, Ivan Rakov
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Val,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If a storage is in
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> corrupted state, does
it mean that it needs to be
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> completely
>>>>
>>>>> removed
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>>> cluster needs to be restarted without
data?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, there's a chance that
in LOG_ONLY all local data will
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> be
>>>>
>>>>> lost,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> but only in *power loss**/ OS crash*
case.
>>>>>>>>>>>>
>>>>>>>>>>>>> kill -9, JVM crash, death of critical
system thread and
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> all
>>>
>>>> other
>>>>>>>>>>>>>>>> cases that usually take place
are variations of *process
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> crash*.
>>>>>
>>>>>> All
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> WAL modes (except NONE, of course)
ensure corruption-safety
>>>>>>>>>>>>>
>>>>>>>>>>>> in
>>>
>>>> case
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> of
>>>>>>>>>>>>
>>>>>>>>>>>>> process crash.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If so, I'm not sure any mode
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> that allows corruption
makes much sense to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It depends on performance
impact of enforcing power-loss
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> corruption
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> safety. Price of full protection
from power loss is high -
>>>>>>>>>>>>
>>>>>>>>>>> FSYNC
>>>
>>>> is
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> way slower (2-10 times) than other
WAL modes. The question is
>>>>>>>>>>>>
>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ensuring weaker guarantees (corruption
can't happen, but loss
>>>>>>>>>>>>>
>>>>>>>>>>>> of
>>>>
>>>>> last
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> updates can) will affect performance
as badly as strong
>>>>>>>>>>>>>
>>>>>>>>>>>>>> guarantees.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll share benchmark results soon.
>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 20.03.2018 5:09, Valentin
Kulichenko wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What do we understand
under "data corruption" here? If a
>>>>>>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>
>>>>>>>>>>>>> corrupted state, does it mean that it
needs to be completely
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>>> cluster needs to be restarted without
data? If so, I'm not
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sure
>>>>
>>>>> any
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>
>>>>>>>>>>>>>> that allows corruption makes much
sense to me. How am I
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> supposed
>>>>>
>>>>>> to
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> use a
>>>>>>>>>>>>>
>>>>>>>>>>>>>> database, if virtually any failure
can end with complete
>>>>>>>>>>>>>>>>> loss of
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> data?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In any case, this definitely
should not be a default
>>>>>>>>>>>>>>
>>>>>>>>>>>>> behavior.
>>>
>>>> If
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> user ever
>>>>>>>>>>>>
>>>>>>>>>>>>> switches to corruption-unsafe mode, there
should be a
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> clear
>>>
>>>> warning
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>
>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018
at 1:06 AM, Ivan Rakov <
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ivan.glukos@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ticket to track changes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 16.03.2018 10:58,
Dmitriy Setrakyan wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018
at 12:55 AM, Ivan Rakov <
>>>>>>>>>>>>>>>>>> ivan.glukos@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Unlike BACKGROUND,
LOG_ONLY provides strict write
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> guarantees
>>>>>
>>>>>> unless power
>>>>>>>>>>>>>>>>>>>> loss has
happened.
>>>>>>>>>>>>>>>>>>>> Seems like
we need to measure performance difference
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> to
>>>
>>>> decide
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> whether do
>>>>>>>>>>>>
>>>>>>>>>>>>> we need separate WAL mode. If it will
be invisible,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> we'll
>>>>
>>>>> just
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> fix
>>>>>>>>>>>>
>>>>>>>>>>>>> these
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> bugs without introducing new
mode; if it will be
>>>>>>>>>>>>>>>>>>>> perceptible,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> we'll
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> continue the discussion
about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, this
sounds like the right approach.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message