spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject how to print RDD by key into file with grouByKey
Date Fri, 13 Mar 2015 18:58:54 GMT
Hi
I have an RDD: RDD[(String, scala.Iterable[(Long, Int)])] which I want to print into a file,
a file for each key string.
I tried to trigger a repartition of the RDD by doing group by on it. The grouping gives RDD[(String,
scala.Iterable[Iterable[(Long, Int)]])] so  I flattened that:
  Rdd.groupByKey().mapValues(x=>x.flatten)

However, when I print with saveAsTextFile I get only 2 files

I was under the impression that groupBy repartitions the data by key and saveAsTextFile make
a file per partition.
What am I doing wrong here?


Thanks
Adrian

Mime
View raw message