spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Haberman <>
Subject Re: Save RDDs as CSV
Date Thu, 31 Oct 2013 04:51:35 GMT

> Doing a coalesce will be kind of a problem... I was hoping that would
> be a utility or command option  that could concat all the files
> together for me...

If you do rdd.coalesce(1, shuffle = true), then rdd itself will still
be processed in parallel (with each of its partitions' output getting
written to disk), and only the final saveAsTextFile task will be
non-parallel (it will sequentially pull in each upstream partition's
output and write it to the single output file).

In other words, coalesce(1, shuffle = true) for all intents and
purposes is concat.

Or is there a reason you would not find this sufficient?

- Stephen

View raw message