ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Pavlov <dpavlov....@gmail.com>
Subject Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Fri, 23 Mar 2018 22:26:05 GMT
Hi Val,

NONE means that the WAL log is disabled and not written at all. Use of the
mode is at your own risk. It is possible that restore state after the crash
at the middle of checkpoint will not succeed. I do not see much sence in
it, especially in production.

BACKGROUND is full functional WAL mode, but allows some delay before flush
to disk.

Sincerely,
Dmitriy Pavlov

сб, 24 мар. 2018 г. в 1:07, Valentin Kulichenko <
valentin.kulichenko@gmail.com>:

> I agree. In my view, any possibility to get a corrupted storage is a bug
> which needs to be fixed.
>
> BTW, can someone explain semantics of NONE mode? What is the difference
> from BACKGROUND from user's perspective? Is there any particular use case
> where it can be used?
>
> -Val
>
> On Fri, Mar 23, 2018 at 2:49 AM, Dmitry Pavlov <dpavlov.spb@gmail.com>
> wrote:
>
> > Hi Ivan,
> >
> > IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glukos@gmail.com>:
> >
> > > Igniters, there's another important question about this matter.
> > > Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we
> > > have to do it: it will cause similar performance drop, but if we
> > > consider LOG_ONLY broken without these fixes, BACKGROUND is broken as
> > well.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 23.03.2018 10:27, Ivan Rakov wrote:
> > > > Fixes are quite simple.
> > > > I expect them to be merged in master in a week in worst case.
> > > >
> > > > Best Regards,
> > > > Ivan Rakov
> > > >
> > > > On 22.03.2018 17:49, Denis Magda wrote:
> > > >> Ivan,
> > > >>
> > > >> How quick are you going to merge the fix into the master? Many
> > > >> persistence
> > > >> related optimizations have already stacked up. Probably, we can
> > release
> > > >> them sooner if the community agrees.
> > > >>
> > > >> --
> > > >> Denis
> > > >>
> > > >> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glukos@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Thanks all!
> > > >>> We seem to have reached a consensus on this issue. I'll just add
> > > >>> necessary
> > > >>> fsyncs under IGNITE-7754.
> > > >>>
> > > >>> Best Regards,
> > > >>> Ivan Rakov
> > > >>>
> > > >>>
> > > >>> On 22.03.2018 15:13, Ilya Lantukh wrote:
> > > >>>
> > > >>>> +1 for fixing LOG_ONLY. If current implementation doesn't
protect
> > from
> > > >>>> data
> > > >>>> corruption, it doesn't make sence.
> > > >>>>
> > > >>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dmagda@apache.org>
> > > >>>> wrote:
> > > >>>>
> > > >>>> +1 for the fix of LOG_ONLY
> > > >>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
> > > >>>>> alexey.goncharuk@gmail.com> wrote:
> > > >>>>>
> > > >>>>> +1 for fixing LOG_ONLY to enforce corruption safety given
the
> > > >>>>> provided
> > > >>>>>> performance results.
> > > >>>>>>
> > > >>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <
> vozerov@gridgain.com
> > >:
> > > >>>>>>
> > > >>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that
much and not a
> > > >>>>>> drop
> > > >>>>>> at
> > > >>>>>> all, provided that we fixing a bug. I.e. should we
implement it
> > > >>>>>> correctly
> > > >>>>>> in the first place we would never notice any "drop".
> > > >>>>>>> I do not understand why someone would like to
use current
> broken
> > > >>>>>>> mode.
> > > >>>>>>>
> > > >>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov
> > > >>>>>>> <dpavlov.spb@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Hi, I think option 1 is better. As Val said any
mode that
> allows
> > > >>>>>>> corruption
> > > >>>>>>>
> > > >>>>>>>> does not make much sense.
> > > >>>>>>>>
> > > >>>>>>>> What Ivan mentioned here as drop, in relation
to old mode
> > DEFAULT
> > > >>>>>>>>
> > > >>>>>>> (FSYNC
> > > >>>>>>> now), is still significant perfromance boost.
> > > >>>>>>>> Sincerely,
> > > >>>>>>>> Dmitriy Pavlov
> > > >>>>>>>>
> > > >>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov
<
> ivan.glukos@gmail.com
> > >:
> > > >>>>>>>>
> > > >>>>>>>> I've attached benchmark results to the JIRA
ticket.
> > > >>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE
mode, independent
> > of
> > > >>>>>>>>>
> > > >>>>>>>> WAL
> > > >>>>>> compaction enabled flag. It's pretty significant drop:
WAL
> > > >>>>>>>> compaction
> > > >>>>>> itself gives only ~3% drop.
> > > >>>>>>>>> I see two options here:
> > > >>>>>>>>> 1) Change LOG_ONLY behavior. That implies
that we'll be ready
> > to
> > > >>>>>>>>>
> > > >>>>>>>> release
> > > >>>>>>>> AI 2.5 with 7% drop.
> > > >>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default,
add release note
> > > >>>>>>>>> to AI
> > > >>>>>>>>>
> > > >>>>>>>> 2.5
> > > >>>>>>> that we added power loss durability in default
mode, but user
> may
> > > >>>>>>>>> fallback to previous LOG_ONLY in order
to retain performance.
> > > >>>>>>>>>
> > > >>>>>>>>> Thoughts?
> > > >>>>>>>>>
> > > >>>>>>>>> Best Regards,
> > > >>>>>>>>> Ivan Rakov
> > > >>>>>>>>>
> > > >>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Val,
> > > >>>>>>>>>>
> > > >>>>>>>>>> If a storage is in
> > > >>>>>>>>>>> corrupted state, does it mean
that it needs to be
> completely
> > > >>>>>>>>>>>
> > > >>>>>>>>>> removed
> > > >>>>>>> and
> > > >>>>>>>>> cluster needs to be restarted without
data?
> > > >>>>>>>>>> Yes, there's a chance that in LOG_ONLY
all local data will
> be
> > > >>>>>>>>>>
> > > >>>>>>>>> lost,
> > > >>>>>> but only in *power loss**/ OS crash* case.
> > > >>>>>>>>>> kill -9, JVM crash, death of critical
system thread and all
> > > >>>>>>>>>> other
> > > >>>>>>>>>> cases that usually take place are
variations of *process
> > crash*.
> > > >>>>>>>>>>
> > > >>>>>>>>> All
> > > >>>>>>> WAL modes (except NONE, of course) ensure corruption-safety
in
> > > >>>>>>>>> case
> > > >>>>>> of
> > > >>>>>>>> process crash.
> > > >>>>>>>>>> If so, I'm not sure any mode
> > > >>>>>>>>>>> that allows corruption makes much
sense to me.
> > > >>>>>>>>>>>
> > > >>>>>>>>>> It depends on performance impact of
enforcing power-loss
> > > >>>>>>>>>>
> > > >>>>>>>>> corruption
> > > >>>>>> safety. Price of full protection from power loss is
high - FSYNC
> > > >>>>>>>>> is
> > > >>>>>> way slower (2-10 times) than other WAL modes. The
question is
> > > >>>>>>>>> whether
> > > >>>>>>> ensuring weaker guarantees (corruption can't happen,
but loss
> of
> > > >>>>>>>>> last
> > > >>>>>>> updates can) will affect performance as badly
as strong
> > > >>>>>>>>> guarantees.
> > > >>>>>> I'll share benchmark results soon.
> > > >>>>>>>>>> Best Regards,
> > > >>>>>>>>>> Ivan Rakov
> > > >>>>>>>>>>
> > > >>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko
wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Guys,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What do we understand under "data
corruption" here? If a
> > > >>>>>>>>>>> storage
> > > >>>>>>>>>>>
> > > >>>>>>>>>> is
> > > >>>>>>> in
> > > >>>>>>>
> > > >>>>>>>> corrupted state, does it mean that it needs
to be completely
> > > >>>>>>>>>> removed
> > > >>>>>>> and
> > > >>>>>>>>> cluster needs to be restarted without
data? If so, I'm not
> sure
> > > >>>>>>>>>> any
> > > >>>>>>> mode
> > > >>>>>>>>> that allows corruption makes much sense
to me. How am I
> > supposed
> > > >>>>>>>>>> to
> > > >>>>>>> use a
> > > >>>>>>>>>>> database, if virtually any failure
can end with complete
> > > >>>>>>>>>>> loss of
> > > >>>>>>>>>>>
> > > >>>>>>>>>> data?
> > > >>>>>>>> In any case, this definitely should not be
a default behavior.
> > > >>>>>>>>>> If
> > > >>>>>> user ever
> > > >>>>>>>>>>> switches to corruption-unsafe
mode, there should be a clear
> > > >>>>>>>>>>>
> > > >>>>>>>>>> warning
> > > >>>>>>> about
> > > >>>>>>>>>>> this.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> -Val
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM,
Ivan Rakov <
> > > >>>>>>>>>>>
> > > >>>>>>>>>> ivan.glukos@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>> Ticket to track changes:
> > > >>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best Regards,
> > > >>>>>>>>>>>> Ivan Rakov
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy
Setrakyan wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55
AM, Ivan Rakov <
> > > >>>>>>>>>>>> ivan.glukos@gmail.com
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>>>> Vladimir,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Unlike BACKGROUND,
LOG_ONLY provides strict write
> > guarantees
> > > >>>>>>>>>>>>>> unless power
> > > >>>>>>>>>>>>>> loss has happened.
> > > >>>>>>>>>>>>>> Seems like we need
to measure performance difference to
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>> decide
> > > >>>>>> whether do
> > > >>>>>>>>>>>>>> we need separate WAL
mode. If it will be invisible,
> we'll
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>> just
> > > >>>>>> fix
> > > >>>>>>>> these
> > > >>>>>>>>>>>>>> bugs without introducing
new mode; if it will be
> > > >>>>>>>>>>>>>> perceptible,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>> we'll
> > > >>>>>>>> continue the discussion about introducing
LOG_ONLY_SAFE.
> > > >>>>>>>>>>>>>> Makes sense?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Yes, this sounds like
the right approach.
> > > >>>>>>>>>>>>>>
> > > >>>>
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message