spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jisoo Kim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes
Date Fri, 24 Feb 2017 00:41:44 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881644#comment-15881644
] 

Jisoo Kim commented on SPARK-19698:
-----------------------------------

[~kayousterhout] If the failed task gets re-tried, as long as Driver doesn't shut down before
the next attempt finishes, it should be ok because the next attempt will upload a file as
intended. That's actually similar to what happened in my workload, executor was lost due to
OOME and stage was resubmitted eventually. If the driver didn't think that the job was done,
things would've been fine. The driver didn't mark the partition that the failed task was responsible
for as "finished", so in the next attempt, the task finished successfully (and there were
no race condition for this specific task because the executor that was running this task was
lost) but one of the other tasks had a such problem. One thing I am not sure about my solution
is a possible performance regression, but I think it might be better than having some kind
of an "incorrect" external state unless it is not recommended and not a good practice to have
a task to modify some external state.

> Race condition in stale attempt task completion vs current attempt task completion when
task is doing persistent state changes
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19698
>                 URL: https://issues.apache.org/jira/browse/SPARK-19698
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos, Spark Core
>    Affects Versions: 2.0.0
>            Reporter: Charles Allen
>
> We have encountered a strange scenario in our production environment. Below is the best
guess we have right now as to what's going on.
> Potentially, the final stage of a job has a failure in one of the tasks (such as OOME
on the executor) which can cause tasks for that stage to be relaunched in a second attempt.
> https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155
> keeps track of which tasks have been completed, but does NOT keep track of which attempt
those tasks were completed in. As such, we have encountered a scenario where a particular
task gets executed twice in different stage attempts, and the DAGScheduler does not consider
if the second attempt is still running. This means if the first task attempt succeeded, the
second attempt can be cancelled part-way through its run cycle if all other tasks (including
the prior failed) are completed successfully.
> What this means is that if a task is manipulating some state somewhere (for example:
a upload-to-temporary-file-location, then delete-then-move on an underlying s3n storage implementation)
the driver can improperly shutdown the running (2nd attempt) task between state manipulations,
leaving the persistent state in a bad state since the 2nd attempt never got to complete its
manipulations, and was terminated prematurely at some arbitrary point in its state change
logic (ex: finished the delete but not the move).
> This is using the mesos coarse grained executor. It is unclear if this behavior is limited
to the mesos coarse grained executor or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message