spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Darabos <daniel.dara...@lynxanalytics.com>
Subject Is shuffle "stable"?
Date Sat, 14 Jun 2014 19:14:43 GMT
What I mean is, let's say I run this:

sc.parallelize(Seq(0->3, 0->2, 0->1), 3).partitionBy(HashPartitioner(3)).collect


Will the result always be Array((0,3), (0,2), (0,1))? Or could I
possibly get a different order?


I'm pretty sure the shuffle files are taken in the order of the source
partitions... But after much search, and the discussion on
http://stackoverflow.com/questions/24206660/does-groupbykey-in-spark-preserve-the-original-order
I still can't find the code that does this.


Thanks!

Mime
View raw message