spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19538) DAGScheduler and TaskSetManager can have an inconsistent view of whether a stage is complete.
Date Thu, 09 Feb 2017 23:56:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860427#comment-15860427
] 

Apache Spark commented on SPARK-19538:
--------------------------------------

User 'kayousterhout' has created a pull request for this issue:
https://github.com/apache/spark/pull/16877

> DAGScheduler and TaskSetManager can have an inconsistent view of whether a stage is complete.
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19538
>                 URL: https://issues.apache.org/jira/browse/SPARK-19538
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>
> The pendingPartitions in Stage tracks partitions that still need to be computed, and
is used by the DAGScheduler to determine when to mark the stage as complete.  In most cases,
this variable is exactly consistent with the tasks in the TaskSetManager (for the current
version of the stage) that are still pending.  However, as discussed in SPARK-19263, these
can become inconsistent when an ShuffleMapTask for an earlier attempt of the stage completes,
in which case the DAGScheduler may think the stage has finished, while the TaskSetManager
is still waiting for some tasks to complete (see the description in this pull request: https://github.com/apache/spark/pull/16620).
 This leads to bugs like SPARK-19263.  Another problem with this behavior is that listeners
can get two StageCompleted messages: once when the DAGScheduler thinks the stage is complete,
and a second when the TaskSetManager later decides the stage is complete.  We should fix this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message