spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Ningjun (LNG-NPV)" <ningjun.w...@lexisnexis.com>
Subject Cannot iterate items in rdd.mapPartition()
Date Fri, 26 Jun 2015 18:52:57 GMT
In rdd.mapPartition(...) if I try to iterate through the items in the partition, everything
screw. For example

val rdd = sc.parallelize(1 to 1000, 3)
val count = rdd.mapPartitions(iter => {
  println(iter.length)
  iter
}).count()


The count is 0. This is incorrect. The count should be 1000. If I just comment out the line
println(iter.length), then the count become 1000 correctly.

Does this mean I cannot iterate through iter in mapPartitions? I want to get all items in
a partition and compose one request to send to external system. How can I achieve that if
I am not allowed to iterate through items in the partition?

Ningjun


Mime
View raw message