spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: How to write a RDD into One Local Existing File?
Date Sat, 18 Oct 2014 00:37:16 GMT
You can save to a local file. What are you trying and what doesn't work?

You can output one file by repartitioning to 1 partition but this is
probably not a good idea as you are bottlenecking the output and some
upstream computation by disabling parallelism.

How about just combining the files on HDFS afterwards? or just reading
all the files instead of 1? You can hdfs dfs -cat a bunch of files at

On Fri, Oct 17, 2014 at 6:46 PM, Parthus <> wrote:
> Hi,
> I have a spark mapreduce task which requires me to write the final rdd to an
> existing local file (appending to this file). I tried two ways but neither
> works well:
> 1. use saveAsTextFile() api. Spark 1.1.0 claims that this API can write to
> local, but I never make it work. Moreover, the result is not one file but a
> series of part-xxxxx files which is not what I hope to get.
> 2. collect the rdd to an array and write it to the driver node using Java's
> File IO. There are also two problems: 1) my RDD is huge(1TB), which cannot
> fit into the memory of one driver node. I have to split the task into small
> pieces and collect them part by part and write; 2) During the writing by
> Java IO, the Spark Mapreduce task has to wait, which is not efficient.
> Could anybody provide me an efficient way to solve this problem? I wish that
> the solution could be like: appending a huge rdd to a local file without
> pausing the MapReduce during writing?
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message