Unfortunately this is expensive to do on HDFS — you’d need a single writer to write the whole file. If your file is small enough for that, you can use coalesce() on the RDD to bring all the data to one node, and then save it. However most HDFS applications work with directories containing multiple files instead of single files for this reason.
On Jan 6, 2014, at 10:56 PM, Nan Zhu <email@example.com> wrote:
> Hi, all
> maybe a stupid question, but is there any way to make Spark write a single file instead of partitioned files?
> Nan Zhu