spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Pritchard <>
Subject Split RDD and save as separate files
Date Wed, 11 Sep 2013 05:16:41 GMT

I have an RDD of (Key, Value) pairs that I would like to save to HDFS.
However, rather than putting everything into one file, I would like to
split the RDD by key and save each part as a separate file. The key would
become the filename.

In short, I am trying to do something like this:
myRDD.groupByKey().foreach{ case(key, values) => values.saveAsTextFile(key)

This obviously doesn't work since values is of type Seq[V] instead of
RDD[V], but does anyone have any suggestions for doing this efficiently?
Currently, I am repeatedly filtering and saving the RDD, but this seems


View raw message