HDFS, since 0.21, has a concat() method which would do exactly this, but I am not sure of the performance implications. Of course, as Matei pointed out, it's unusual to actually need a single HDFS file.


On Mon, Jan 6, 2014 at 9:08 PM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
Unfortunately this is expensive to do on HDFS — you’d need a single writer to write the whole file. If your file is small enough for that, you can use coalesce() on the RDD to bring all the data to one node, and then save it. However most HDFS applications work with directories containing multiple files instead of single files for this reason.

Matei

On Jan 6, 2014, at 10:56 PM, Nan Zhu <zhunanmcgill@gmail.com> wrote:

> Hi, all
>
> maybe a stupid question, but is there any way to make Spark write a single file instead of partitioned files?
>
> Best,
>
> --
> Nan Zhu
>