spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <>
Subject Re: spark streaming multiple file output paths
Date Thu, 07 Aug 2014 17:04:22 GMT
The problem boils down to how to write an RDD in that way. You could use
the HDFS Filesystem API to write each partition directly.

pairRDD.groupByKey().foreachPartition(iterator => { case (key, values) =>
      // Open an output stream to destination file
      // Write values to the file
      // Close the file

You can even go fancier by writing to a temp file, and then moving the file
to the write location. This is tolerate failures in the middle of writing
(saveAsTextFile does this underneath).


On Thu, Aug 7, 2014 at 8:39 AM, Chen Song <> wrote:

> In Spark Streaming, is there a way to write output to different paths
> based on the partition key? The saveAsTextFiles method will write output in
> the same directory.
> For example, if the partition key has a hour/day column and I want to
> separate DStream output into different directories by hour/day.
> --
> Chen Song

View raw message