ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Fri, 23 Mar 2018 07:27:00 GMT
Fixes are quite simple.
I expect them to be merged in master in a week in worst case.

Best Regards,
Ivan Rakov

On 22.03.2018 17:49, Denis Magda wrote:
> Ivan,
>
> How quick are you going to merge the fix into the master? Many persistence
> related optimizations have already stacked up. Probably, we can release
> them sooner if the community agrees.
>
> --
> Denis
>
> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glukos@gmail.com> wrote:
>
>> Thanks all!
>> We seem to have reached a consensus on this issue. I'll just add necessary
>> fsyncs under IGNITE-7754.
>>
>> Best Regards,
>> Ivan Rakov
>>
>>
>> On 22.03.2018 15:13, Ilya Lantukh wrote:
>>
>>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
>>> data
>>> corruption, it doesn't make sence.
>>>
>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dmagda@apache.org> wrote:
>>>
>>> +1 for the fix of LOG_ONLY
>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>>>> alexey.goncharuk@gmail.com> wrote:
>>>>
>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
>>>>> performance results.
>>>>>
>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
>>>>>
>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
>>>>> at
>>>>> all, provided that we fixing a bug. I.e. should we implement it
>>>>> correctly
>>>>> in the first place we would never notice any "drop".
>>>>>> I do not understand why someone would like to use current broken
mode.
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <dpavlov.spb@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>>>> corruption
>>>>>>
>>>>>>> does not make much sense.
>>>>>>>
>>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
>>>>>>>
>>>>>> (FSYNC
>>>>>> now), is still significant perfromance boost.
>>>>>>> Sincerely,
>>>>>>> Dmitriy Pavlov
>>>>>>>
>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <ivan.glukos@gmail.com>:
>>>>>>>
>>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent
of
>>>>>>>>
>>>>>>> WAL
>>>>> compaction enabled flag. It's pretty significant drop: WAL
>>>>>>> compaction
>>>>> itself gives only ~3% drop.
>>>>>>>> I see two options here:
>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready
to
>>>>>>>>
>>>>>>> release
>>>>>>> AI 2.5 with 7% drop.
>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release
note to AI
>>>>>>>>
>>>>>>> 2.5
>>>>>> that we added power loss durability in default mode, but user may
>>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Ivan Rakov
>>>>>>>>
>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>>
>>>>>>>>> Val,
>>>>>>>>>
>>>>>>>>> If a storage is in
>>>>>>>>>> corrupted state, does it mean that it needs to be
completely
>>>>>>>>>>
>>>>>>>>> removed
>>>>>> and
>>>>>>>> cluster needs to be restarted without data?
>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data
will be
>>>>>>>>>
>>>>>>>> lost,
>>>>> but only in *power loss**/ OS crash* case.
>>>>>>>>> kill -9, JVM crash, death of critical system thread and
all other
>>>>>>>>> cases that usually take place are variations of *process
crash*.
>>>>>>>>>
>>>>>>>> All
>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>>>>>>>> case
>>>>> of
>>>>>>> process crash.
>>>>>>>>> If so, I'm not sure any mode
>>>>>>>>>> that allows corruption makes much sense to me.
>>>>>>>>>>
>>>>>>>>> It depends on performance impact of enforcing power-loss
>>>>>>>>>
>>>>>>>> corruption
>>>>> safety. Price of full protection from power loss is high - FSYNC
>>>>>>>> is
>>>>> way slower (2-10 times) than other WAL modes. The question is
>>>>>>>> whether
>>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
>>>>>>>> last
>>>>>> updates can) will affect performance as badly as strong
>>>>>>>> guarantees.
>>>>> I'll share benchmark results soon.
>>>>>>>>> Best Regards,
>>>>>>>>> Ivan Rakov
>>>>>>>>>
>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>>>
>>>>>>>>>> Guys,
>>>>>>>>>>
>>>>>>>>>> What do we understand under "data corruption" here?
If a storage
>>>>>>>>>>
>>>>>>>>> is
>>>>>> in
>>>>>>
>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>> removed
>>>>>> and
>>>>>>>> cluster needs to be restarted without data? If so, I'm not
sure
>>>>>>>>> any
>>>>>> mode
>>>>>>>> that allows corruption makes much sense to me. How am I supposed
>>>>>>>>> to
>>>>>> use a
>>>>>>>>>> database, if virtually any failure can end with complete
loss of
>>>>>>>>>>
>>>>>>>>> data?
>>>>>>> In any case, this definitely should not be a default behavior.
>>>>>>>>> If
>>>>> user ever
>>>>>>>>>> switches to corruption-unsafe mode, there should
be a clear
>>>>>>>>>>
>>>>>>>>> warning
>>>>>> about
>>>>>>>>>> this.
>>>>>>>>>>
>>>>>>>>>> -Val
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>>>>>>>>>
>>>>>>>>> ivan.glukos@gmail.com>
>>>>>> wrote:
>>>>>>>>>> Ticket to track changes:
>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov
<
>>>>>>>>>>> ivan.glukos@gmail.com
>>>>>>> wrote:
>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>
>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides
strict write guarantees
>>>>>>>>>>>>> unless power
>>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>>> Seems like we need to measure performance
difference to
>>>>>>>>>>>>>
>>>>>>>>>>>> decide
>>>>> whether do
>>>>>>>>>>>>> we need separate WAL mode. If it will
be invisible, we'll
>>>>>>>>>>>>>
>>>>>>>>>>>> just
>>>>> fix
>>>>>>> these
>>>>>>>>>>>>> bugs without introducing new mode; if
it will be perceptible,
>>>>>>>>>>>>>
>>>>>>>>>>>> we'll
>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>>>>>>>
>>>


Mime
View raw message