spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anny Chen <anny9...@gmail.com>
Subject Re: How to output to S3 and keep the order
Date Tue, 20 Jan 2015 17:43:56 GMT
Thanks Aniket! It is working now.

Anny

On Mon, Jan 19, 2015 at 5:56 PM, Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

> When you repartiton, ordering can get lost. You would need to sort after
> repartitioning.
>
> Aniket
>
> On Tue, Jan 20, 2015, 7:08 AM anny9699 <anny9699@gmail.com> wrote:
>
>> Hi,
>>
>> I am using Spark on AWS and want to write the output to S3. It is a
>> relatively small file and I don't want them to output as multiple parts.
>> So
>> I use
>>
>> result.repartition(1).saveAsTextFile("s3://...")
>>
>> However as long as I am using the saveAsTextFile method, the output
>> doesn't
>> keep the original order. But if I use BufferedWriter in Java to write the
>> output, I could only write to the master machine instead of S3 directly.
>> Is
>> there a way that I could write to S3 and the same time keep the order?
>>
>> Thanks a lot!
>> Anny
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/How-to-output-to-S3-and-keep-the-order-tp21246.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>

Mime
View raw message