spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Updated] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources
Date Sat, 02 Jan 2016 14:23:39 GMT


Sean Owen updated SPARK-8119:
    Labels:   (was: backport-needed)

> HeartbeatReceiver should not adjust application executor resources
> ------------------------------------------------------------------
>                 Key: SPARK-8119
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.0
>            Reporter: SaintBacchus
>            Assignee: Andrew Or
>            Priority: Critical
>             Fix For: 1.5.0
> DynamicAllocation will set the total executor to a little number when it wants to kill
some executors.
> But in no-DynamicAllocation scenario, Spark will also set the total executor.
> So it will cause such problem: sometimes an executor fails down, there is no more executor
which will be pull up by spark.
> === EDIT by andrewor14 ===
> The issue is that the AM forgets about the original number of executors it wants after
calling sc.killExecutor. Even if dynamic allocation is not enabled, this is still possible
because of heartbeat timeouts.
> I think the problem is that sc.killExecutor is used incorrectly in HeartbeatReceiver.
The intention of the method is to permanently adjust the number of executors the application
will get. In HeartbeatReceiver, however, this is used as a best-effort mechanism to ensure
that the timed out executor is dead.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message