spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: ExecutorState.LOADING?
Date Wed, 09 Jul 2014 22:50:19 GMT
Agreed that the behavior of the Master killing off an Application when
Executors from the same set of nodes repeatedly die is silly. This can also
strike if a single node enters a state where any Executor created on it
quickly dies (e.g., a block device becomes faulty). This prevents the
Application from launching despite only one node being bad.


On Wed, Jul 9, 2014 at 3:08 PM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> Actually, I'm thinking about re-purposing it.  There's a nasty behavior
> that I'll open a JIRA for soon, and that I'm thinking about addressing by
> introducing/using another ExecutorState transition.  The basic problem is
> that Master can be overly aggressive in calling removeApplication on
> ExecutorStateChanged.  For example, say you have a working, long-running
> Spark stand-alone-mode application and then try to add some more worker
> nodes, but manage to misconfigure the new nodes so that on the new nodes
> Executors never successfully start.  In that scenario, you will repeatedly
> end up in the !normalExit branch of Master's receive ExecutorStateChanged,
> quickly exceed ApplicationState.MAX_NUM_RETRY (a non-configurable 10, which
> is another irritation), and end up having your application killed off even
> though it is still running successfully on the old worker nodes.
>
>
>
> On Wed, Jul 9, 2014 at 2:49 PM, Kay Ousterhout <keo@eecs.berkeley.edu>
> wrote:
>
> > Git history to the rescue!  It seems to have been added by Matei way back
> > in July 2012:
> >
> >
> https://github.com/apache/spark/commit/5d1a887bed8423bd6c25660910d18d91880e01fe
> >
> > and then was removed a few months later (replaced by RUNNING) by the same
> > Mr. Zaharia:
> >
> >
> https://github.com/apache/spark/commit/bb1bce79240da22c2677d9f8159683cdf73158c2#diff-776a630ac2b2ec5fe85c07ca20a58fc0
> >
> > So I'd say it's safe to delete it.
> >
> >
> > On Wed, Jul 9, 2014 at 2:36 PM, Mark Hamstra <mark@clearstorydata.com>
> > wrote:
> >
> > > Doesn't look to me like this is used.  Does anybody recall what it was
> > > intended for?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message