spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "L. C. Hsieh (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-29649) Stop task set if FileAlreadyExistsException was thrown when writing to output file
Date Wed, 30 Oct 2019 06:51:00 GMT
L. C. Hsieh created SPARK-29649:
-----------------------------------

             Summary: Stop task set if FileAlreadyExistsException was thrown when writing
to output file
                 Key: SPARK-29649
                 URL: https://issues.apache.org/jira/browse/SPARK-29649
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: L. C. Hsieh
            Assignee: L. C. Hsieh


We already know task attempts that do not clean up output files in staging directory can cause
job failure (SPARK-27194). There was proposals trying to fix it by changing output filename,
or deleting existing output files. These proposals are not reliable completely.

The difficulty is, as previous failed task attempt wrote the output file, at next task attempt
the output file is still under same staging directory, even the output file name is different.

If the job will go to fail eventually, there is no point to re-run the task until max attempts
are reached. For the jobs running a lot of time, re-running the task can waste a lot of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message