ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject Re: System Worker Failure Handler on local laptop
Date Mon, 14 Jan 2019 12:08:04 GMT
Guys,

there is no problem in blocking thread monitroing. Please, look at the
error message: "failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=class
o.a.i.IgniteCheckedException: Node is stopping: grid-2]]". Some
critical worker was terminated unexpectedly. So the problem isn't
related with any timeouts. It's a bug that should be investigated.



On Thu, Dec 27, 2018 at 9:27 PM Denis Magda <dmagda@apache.org> wrote:
>
> Folks,
>
> What are the current timeouts? We need to know the probability of failures
> in dev environment. This affect usability.
>
> --
> Denis
>
> On Thu, Dec 27, 2018 at 4:59 AM Alexey Goncharuk <alexey.goncharuk@gmail.com>
> wrote:
>
> > Nikolay,
> >
> > Yes, the fix is already in master. Looks like I was wrong, in your case
> > failure handler is triggered by 'Node is stopping: grid-2'. Can you please
> > share the full trace?
> >
> >
> >
> > чт, 27 дек. 2018 г. в 12:41, Nikolay Izhikov <nizhikov@apache.org>:
> >
> > > Alexey
> > >
> > > Fix for this issue already in master?
> > > I run tests on current master.
> > >
> > > > Should we somehow announce it on the user-list or highlight on
> > readme.io
> > > ?
> > >
> > > I don't think our users will be happy to users stuck with this behavior
> > in
> > > production.
> > >
> > > Am I understand you correctly:
> > > If someone use 2.7. release and Ignite process slowing for a few seconds
> > > for any reason(low-end hardwre, VM pause, other processes grab the
> > > resources) then Ignite node will be stopped?
> > >
> > > > This is the issue I mentioned in "Critical worker threads liveness
> > > checking
> > > drawbacks" topic
> > >
> > > Thanks for the link, I will check it out.
> > >
> > > чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com
> > > >:
> > >
> > > > Hi Nikolay,
> > > >
> > > > This is the issue I mentioned in "Critical worker threads liveness
> > > checking
> > > > drawbacks" topic which I was expecting to be included to Ignite 2.7,
> > but
> > > it
> > > > was not. To workaround the issue, you should set
> > > > DataStorageConfiguration#setCheckpointReadLockTimeout to 0.
> > > >
> > > > Should we somehow announce it on the user-list or highlight on
> > readme.io
> > > ?
> > > >
> > > > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <nizhikov@apache.org>:
> > > >
> > > > > Hello, Igniters.
> > > > >
> > > > > I run into issue with critical system worker failure handler.
> > > > > I just run `IgniteDataFrameSuite` and it terminates on random test.
> > > > > My laptop doesn't have bleeding edge hardware, so tests can take
> > > > > significant amount of time.
> > > > > Looks like our watch dog too aggressive on development environment
> > > > >
> > > > > Can you please, help me. What should I do to configure or turn off
> > > watch
> > > > > dog?
> > > > > Should we relax it a little bit? At least for a test environment.
> > > > >
> > > > > Error message contains following message:
> > > > >
> > > > > ```
> > > > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root]
> > > > > Critical system error detected. Will be handled accordingly to
> > > configured
> > > > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> > > > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> > > > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> > > > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class
> > > > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]]
> > > > > class org.apache.ignite.IgniteCheckedException: Node is stopping:
> > > grid-2
> > > > > ```
> > > > >
> > > >
> > >
> >

Mime
View raw message