spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: DataFrameWriter.save fails job with one executor failure
Date Fri, 25 Mar 2016 20:01:15 GMT
I would not recommend using the direct output committer with HDFS.  Its
intended only as an optimization for S3.

On Fri, Mar 25, 2016 at 4:03 AM, Vinoth Chandar <vinoth@uber.com> wrote:

> Hi,
>
> We are doing the following to save a dataframe in parquet (using
> DirectParquetOutputCommitter) as follows.
>
> dfWriter.format("parquet")
>   .mode(SaveMode.Overwrite)
>   .save(outputPath)
>
> The problem is even if an executor fails once while writing file (say some
> transient HDFS issue), when its re-spawn, it fails again because the file
> exists already, eventually failing the entire job.
>
> Is this a known issue? Any workarounds?
>
> Thanks
> Vinoth
>

Mime
View raw message