spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Cannot iterate items in rdd.mapPartition()
Date Fri, 26 Jun 2015 19:03:47 GMT
Do you want to transform the RDD, or just produce some side effect with its
contents?  If the latter, you want foreachPartition, not mapPartitions.

On Fri, Jun 26, 2015 at 11:52 AM, Wang, Ningjun (LNG-NPV) <
ningjun.wang@lexisnexis.com> wrote:

>  In rdd.mapPartition(…) if I try to iterate through the items in the
> partition, everything screw. For example
>
>
>
> *val *rdd = sc.parallelize(1 to 1000, 3)
> val count = rdd.mapPartitions(iter => {
>
> *println(iter.length)   *iter
> }).count()
>
>
>
>
>
> The count is 0. This is incorrect. The count should be 1000. If I just
> comment out the line *println(iter.length)*, then the count become 1000
> correctly.
>
>
>
> Does this mean I cannot iterate through iter in mapPartitions? I want to
> get all items in a partition and compose one request to send to external
> system. How can I achieve that if I am not allowed to iterate through items
> in the partition?
>
>
>
> Ningjun
>
>
>

Mime
View raw message