spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one
Date Mon, 03 Sep 2018 02:42:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601747#comment-16601747
] 

Wenchen Fan commented on SPARK-23253:
-------------------------------------

Hi [~irashid] thanks for providing these references and sorry for the false alert! I was too
anxious when searching the commit history and mistakenly got to this ticket. You are right,
https://github.com/apache/spark/pull/9610 is the one that needs to revert(partially) to make
my test pass.

According to the discussion in https://github.com/apache/spark/pull/9214 , seems we've already
known the problem of non-dererministic output, but decided to leave it and stick with "first
write wins", as it's too hard to fix. I think https://github.com/apache/spark/pull/6648 is
the right fix.

Since it's not possible to finish https://github.com/apache/spark/pull/6648 before Spark 2.4,
I'll refer it in the code comment and just fail the job if non-deterministic shuffle writing
is detected. In the next release, I can help with https://github.com/apache/spark/pull/6648
to really fix the repartition bug. Thanks!

> Only write shuffle temporary index file when there is not an existing one
> -------------------------------------------------------------------------
>
>                 Key: SPARK-23253
>                 URL: https://issues.apache.org/jira/browse/SPARK-23253
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.2.1
>            Reporter: Kent Yao
>            Assignee: Kent Yao
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Shuffle Index temporay file is used for atomic creating shuffle index file, it is not
needed when the index file already exists after another attempts of same task had it done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message