spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: [DISCUSS] Enable blacklisting feature by default in 3.0
Date Wed, 03 Apr 2019 11:10:16 GMT
On Tue, Apr 2, 2019 at 9:39 PM Ankur Gupta <> wrote:

> Hi Steve,
> Thanks for your feedback. From your email, I could gather the following
> two important points:
>    1. Report failures to something (cluster manager) which can opt to
>    destroy the node and request a new one
>    2. Pluggable failure detection algorithms
> Regarding #1, current blacklisting implementation does report blacklist
> status to Yarn here
> <>,
> which can choose to take appropriate action based on failures across
> different applications (though it seems it doesn't currently). This doesn't
> work in static allocation though and for other cluster managers. Those
> issues are still open:
>    -
>    -
>    -
> Regarding #2, that is a good point but I think that is optional and may
> not be tied to enabling the blacklisting feature in the current form.

I'd expect the algorithms to be done in the controllers, as failures were

One other thing to consider is how to rect where you are down to ~0 nodes.
At that point you may as well give up on the blacklisting because you've
just implicitly shut down the cluster. I seem to remember something (HDFS?)
trying to deal with that

> Coming back to the concerns raised by Reynold, Chris and Steve, it seems
> that there are at least two tasks that we need to complete before we decide
> to enable blacklisting by default in it's current form:
>    1. Avoid resource starvation because of blacklisting
>    2. Use exponential backoff for blacklisting instead of a configurable
>    threshold
>    3. Report blacklisting status to all cluster managers (I am not sure
>    if this is necessary to move forward though)
> Thanks for all the feedback. Please let me know if there are other
> concerns that we would like to resolve before enabling blacklisting.
> Thanks,
> Ankur

View raw message