spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhen <>
Subject multiple passes in mapPartitions
Date Fri, 13 Jun 2014 05:30:14 GMT
I want to take multiple passes through my data in mapPartitions. However, the
iterator only allows you to take one pass through the data. If I transformed
the iterator into an array using iter.toArray, it is too slow, since it
copies all the data into a new scala array. Also it takes twice the memory.
Which is also bad in terms of more GC. 

Is there a faster/better way of taking multiple passes without copying all
the data?

Thank you,


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message