spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Berman <igor.ber...@gmail.com>
Subject Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append
Date Sat, 01 Oct 2016 10:14:13 GMT
Takeshi, why are you saying this, how have you checked it's only used from
2.7.3?
We use spark 2.0 which is shipped with hadoop dependency of 2.7.2 and we
use this setting.
We've sort of "verified" it's used by configuring log of file output
commiter

On 30 September 2016 at 03:12, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> Hi,
>
> FYI: Seems `sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version","2”)`
> is only available at hadoop-2.7.3+.
>
> // maropu
>
>
> On Thu, Sep 29, 2016 at 9:28 PM, joffe.tal <joffe.tal@gmail.com> wrote:
>
>> You can use partition explicitly by adding "/<col_name>=<partition
>> value>" to
>> the end of the path you are writing to and then use overwrite.
>>
>> BTW in Spark 2.0 you just need to use:
>>
>> sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.al
>> gorithm.version","2”)
>> and use s3a://
>>
>> and you can work with regular output committer (actually
>> DirectParquetOutputCommitter is no longer available in Spark 2.0)
>>
>> so if you are planning on upgrading this could be another motivation
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/S3-DirectParquetOutputCommitter-Partit
>> ionBy-SaveMode-Append-tp26398p27810.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message