spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Re: How to output to S3 and keep the order
Date Tue, 20 Jan 2015 01:56:39 GMT
When you repartiton, ordering can get lost. You would need to sort after
repartitioning.

Aniket

On Tue, Jan 20, 2015, 7:08 AM anny9699 <anny9699@gmail.com> wrote:

> Hi,
>
> I am using Spark on AWS and want to write the output to S3. It is a
> relatively small file and I don't want them to output as multiple parts. So
> I use
>
> result.repartition(1).saveAsTextFile("s3://...")
>
> However as long as I am using the saveAsTextFile method, the output doesn't
> keep the original order. But if I use BufferedWriter in Java to write the
> output, I could only write to the master machine instead of S3 directly. Is
> there a way that I could write to S3 and the same time keep the order?
>
> Thanks a lot!
> Anny
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/How-to-output-to-S3-and-keep-the-order-tp21246.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message