spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: multiple passes in mapPartitions
Date Fri, 13 Jun 2014 15:39:20 GMT
Sorry if this is a dumb question but why not several calls to
map-partitions sequentially. Are you looking to avoid function
serialization or is your function damaging partitions?

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Fri, Jun 13, 2014 at 1:30 AM, zhen <z.he@latrobe.edu.au> wrote:

> I want to take multiple passes through my data in mapPartitions. However,
> the
> iterator only allows you to take one pass through the data. If I
> transformed
> the iterator into an array using iter.toArray, it is too slow, since it
> copies all the data into a new scala array. Also it takes twice the memory.
> Which is also bad in terms of more GC.
>
> Is there a faster/better way of taking multiple passes without copying all
> the data?
>
> Thank you,
>
> Zhen
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/multiple-passes-in-mapPartitions-tp7555.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message