spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append
Date Fri, 30 Sep 2016 00:12:01 GMT
Hi,

FYI: Seems `sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version","2”)`
is only available at hadoop-2.7.3+.

// maropu


On Thu, Sep 29, 2016 at 9:28 PM, joffe.tal <joffe.tal@gmail.com> wrote:

> You can use partition explicitly by adding "/<col_name>=<partition value>"
> to
> the end of the path you are writing to and then use overwrite.
>
> BTW in Spark 2.0 you just need to use:
>
> sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.
> algorithm.version","2”)
> and use s3a://
>
> and you can work with regular output committer (actually
> DirectParquetOutputCommitter is no longer available in Spark 2.0)
>
> so if you are planning on upgrading this could be another motivation
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/S3-DirectParquetOutputCommitter-
> PartitionBy-SaveMode-Append-tp26398p27810.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message