ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Fri, 23 Mar 2018 09:23:07 GMT
Igniters, there's another important question about this matter.
Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we 
have to do it: it will cause similar performance drop, but if we 
consider LOG_ONLY broken without these fixes, BACKGROUND is broken as well.

Best Regards,
Ivan Rakov

On 23.03.2018 10:27, Ivan Rakov wrote:
> Fixes are quite simple.
> I expect them to be merged in master in a week in worst case.
>
> Best Regards,
> Ivan Rakov
>
> On 22.03.2018 17:49, Denis Magda wrote:
>> Ivan,
>>
>> How quick are you going to merge the fix into the master? Many 
>> persistence
>> related optimizations have already stacked up. Probably, we can release
>> them sooner if the community agrees.
>>
>> -- 
>> Denis
>>
>> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glukos@gmail.com> 
>> wrote:
>>
>>> Thanks all!
>>> We seem to have reached a consensus on this issue. I'll just add 
>>> necessary
>>> fsyncs under IGNITE-7754.
>>>
>>> Best Regards,
>>> Ivan Rakov
>>>
>>>
>>> On 22.03.2018 15:13, Ilya Lantukh wrote:
>>>
>>>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
>>>> data
>>>> corruption, it doesn't make sence.
>>>>
>>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dmagda@apache.org> 
>>>> wrote:
>>>>
>>>> +1 for the fix of LOG_ONLY
>>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>>>>> alexey.goncharuk@gmail.com> wrote:
>>>>>
>>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the 
>>>>> provided
>>>>>> performance results.
>>>>>>
>>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
>>>>>>
>>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a

>>>>>> drop
>>>>>> at
>>>>>> all, provided that we fixing a bug. I.e. should we implement it
>>>>>> correctly
>>>>>> in the first place we would never notice any "drop".
>>>>>>> I do not understand why someone would like to use current broken

>>>>>>> mode.
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov 
>>>>>>> <dpavlov.spb@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>>>>> corruption
>>>>>>>
>>>>>>>> does not make much sense.
>>>>>>>>
>>>>>>>> What Ivan mentioned here as drop, in relation to old mode
DEFAULT
>>>>>>>>
>>>>>>> (FSYNC
>>>>>>> now), is still significant perfromance boost.
>>>>>>>> Sincerely,
>>>>>>>> Dmitriy Pavlov
>>>>>>>>
>>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <ivan.glukos@gmail.com>:
>>>>>>>>
>>>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent
of
>>>>>>>>>
>>>>>>>> WAL
>>>>>> compaction enabled flag. It's pretty significant drop: WAL
>>>>>>>> compaction
>>>>>> itself gives only ~3% drop.
>>>>>>>>> I see two options here:
>>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll
be ready to
>>>>>>>>>
>>>>>>>> release
>>>>>>>> AI 2.5 with 7% drop.
>>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release
note 
>>>>>>>>> to AI
>>>>>>>>>
>>>>>>>> 2.5
>>>>>>> that we added power loss durability in default mode, but user
may
>>>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Ivan Rakov
>>>>>>>>>
>>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>>>
>>>>>>>>>> Val,
>>>>>>>>>>
>>>>>>>>>> If a storage is in
>>>>>>>>>>> corrupted state, does it mean that it needs to
be completely
>>>>>>>>>>>
>>>>>>>>>> removed
>>>>>>> and
>>>>>>>>> cluster needs to be restarted without data?
>>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local
data will be
>>>>>>>>>>
>>>>>>>>> lost,
>>>>>> but only in *power loss**/ OS crash* case.
>>>>>>>>>> kill -9, JVM crash, death of critical system thread
and all 
>>>>>>>>>> other
>>>>>>>>>> cases that usually take place are variations of *process
crash*.
>>>>>>>>>>
>>>>>>>>> All
>>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>>>>>>>>> case
>>>>>> of
>>>>>>>> process crash.
>>>>>>>>>> If so, I'm not sure any mode
>>>>>>>>>>> that allows corruption makes much sense to me.
>>>>>>>>>>>
>>>>>>>>>> It depends on performance impact of enforcing power-loss
>>>>>>>>>>
>>>>>>>>> corruption
>>>>>> safety. Price of full protection from power loss is high - FSYNC
>>>>>>>>> is
>>>>>> way slower (2-10 times) than other WAL modes. The question is
>>>>>>>>> whether
>>>>>>> ensuring weaker guarantees (corruption can't happen, but loss
of
>>>>>>>>> last
>>>>>>> updates can) will affect performance as badly as strong
>>>>>>>>> guarantees.
>>>>>> I'll share benchmark results soon.
>>>>>>>>>> Best Regards,
>>>>>>>>>> Ivan Rakov
>>>>>>>>>>
>>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>>>>
>>>>>>>>>>> Guys,
>>>>>>>>>>>
>>>>>>>>>>> What do we understand under "data corruption"
here? If a 
>>>>>>>>>>> storage
>>>>>>>>>>>
>>>>>>>>>> is
>>>>>>> in
>>>>>>>
>>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>> removed
>>>>>>> and
>>>>>>>>> cluster needs to be restarted without data? If so, I'm
not sure
>>>>>>>>>> any
>>>>>>> mode
>>>>>>>>> that allows corruption makes much sense to me. How am
I supposed
>>>>>>>>>> to
>>>>>>> use a
>>>>>>>>>>> database, if virtually any failure can end with
complete 
>>>>>>>>>>> loss of
>>>>>>>>>>>
>>>>>>>>>> data?
>>>>>>>> In any case, this definitely should not be a default behavior.
>>>>>>>>>> If
>>>>>> user ever
>>>>>>>>>>> switches to corruption-unsafe mode, there should
be a clear
>>>>>>>>>>>
>>>>>>>>>> warning
>>>>>>> about
>>>>>>>>>>> this.
>>>>>>>>>>>
>>>>>>>>>>> -Val
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>>>>>>>>>>
>>>>>>>>>> ivan.glukos@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> Ticket to track changes:
>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov
<
>>>>>>>>>>>> ivan.glukos@gmail.com
>>>>>>>> wrote:
>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides
strict write guarantees
>>>>>>>>>>>>>> unless power
>>>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>>>> Seems like we need to measure performance
difference to
>>>>>>>>>>>>>>
>>>>>>>>>>>>> decide
>>>>>> whether do
>>>>>>>>>>>>>> we need separate WAL mode. If it
will be invisible, we'll
>>>>>>>>>>>>>>
>>>>>>>>>>>>> just
>>>>>> fix
>>>>>>>> these
>>>>>>>>>>>>>> bugs without introducing new mode;
if it will be 
>>>>>>>>>>>>>> perceptible,
>>>>>>>>>>>>>>
>>>>>>>>>>>>> we'll
>>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>>>>>>>>
>>>>
>


Mime
View raw message