spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Horia <ho...@alum.berkeley.edu>
Subject Re: Wrong result with mapPartitions example
Date Fri, 27 Sep 2013 04:23:52 GMT
Silly question: does sc.parallelize guarantee the allocation of the items
to always be distributed equally across the partitions?

It seems to me that, in the example above, all four items were assigned to
the same partition. Have you tried the same with many more items?
On Sep 26, 2013 9:01 PM, "Shangyu Luo" <lsyurd@gmail.com> wrote:

> Hi,
> I am trying to test mapPartitions function in Spark Python version, but I
> got wrong result.
> More specifically, in pyspark shell:
> >>> rdd = sc.parallelize([1, 2, 3, 4], 2)
> >>> def f(iterator): yield sum(iterator)
> ...
> >>> rdd.mapPartitions(f).collect()
> The result is [0, 10], not [3, 7]
> Is there anything wrong with my code?
> Thanks!
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
>

Mime
View raw message