spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <>
Subject [jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one
Date Mon, 03 Sep 2018 02:42:00 GMT


Wenchen Fan commented on SPARK-23253:

Hi [~irashid] thanks for providing these references and sorry for the false alert! I was too
anxious when searching the commit history and mistakenly got to this ticket. You are right, is the one that needs to revert(partially) to make
my test pass.

According to the discussion in , seems we've already
known the problem of non-dererministic output, but decided to leave it and stick with "first
write wins", as it's too hard to fix. I think is
the right fix.

Since it's not possible to finish before Spark 2.4,
I'll refer it in the code comment and just fail the job if non-deterministic shuffle writing
is detected. In the next release, I can help with
to really fix the repartition bug. Thanks!

> Only write shuffle temporary index file when there is not an existing one
> -------------------------------------------------------------------------
>                 Key: SPARK-23253
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.2.1
>            Reporter: Kent Yao
>            Assignee: Kent Yao
>            Priority: Major
>             Fix For: 2.4.0
> Shuffle Index temporay file is used for atomic creating shuffle index file, it is not
needed when the index file already exists after another attempts of same task had it done.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message