spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Spark output data to S3 is very slow
Date Sat, 17 Sep 2016 02:43:57 GMT
Hi,

Have you seen the previous thread?
https://www.mail-archive.com/user@spark.apache.org/msg56791.html

// maropu


On Sat, Sep 17, 2016 at 11:34 AM, Qiang Li <qli@appannie.com> wrote:

> Hi,
>
>
> I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very
> quickly, but the last step, spark spend lots of time to rename or move data
> from s3 temporary directory to real directory, then I try to set
>
> spark.hadoop.spark.sql.parquet.output.committer.
> class=org.apache.spark.sql.execution.datasources.parquet.
> DirectParquetOutputCommitter
> or
> spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.
> DirectParquetOutputCommitter
>
> But both doesn't work, looks like spark 2.0 removed these configs, how can
> I let spark output directly without temporary directory ?
>
>
>
> *This email may contain or reference confidential information and is
> intended only for the individual to whom it is addressed.  Please refrain
> from distributing, disclosing or copying this email and the information
> contained within unless you are the intended recipient.  If you received
> this email in error, please notify us at legal@appannie.com
> <legal@appannie.com>** immediately and remove it from your system.*




-- 
---
Takeshi Yamamuro

Mime
View raw message