spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nan Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1771) CoarseGrainedSchedulerBackend is not resilient to Akka restarts
Date Tue, 13 May 2014 02:31:15 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995962#comment-13995962
] 

Nan Zhu commented on SPARK-1771:
--------------------------------

[#Aaron Davidson], I think there are basically two ways to fix this bug, which depends on
whether we want to allow the restarting of the driver

1. assume we allow the restarting, we may need something similar to the persistentEngine in
the deploy package

2. if not, we can introduce a supervisor actor to stop the DriverActor and kill the executors....just
similar with what we just did in the DAGScheduler....

> CoarseGrainedSchedulerBackend is not resilient to Akka restarts
> ---------------------------------------------------------------
>
>                 Key: SPARK-1771
>                 URL: https://issues.apache.org/jira/browse/SPARK-1771
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Aaron Davidson
>
> The exception reported in SPARK-1769 was propagated through the CoarseGrainedSchedulerBackend,
and caused an Actor restart of the DriverActor. Unfortunately, this actor does not seem to
have been written with Akka restartability in mind. For instance, the new DriverActor has
lost all state about the prior Executors without cleanly disconnecting them. This means that
the driver actually has executors attached to it, but doesn't think it does, which leads to
mayhem of various sorts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message