spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingyu Kim <m...@palantir.com>
Subject Row order of RDDs
Date Wed, 29 Jan 2014 09:18:20 GMT
Here¹s my understanding of row order guarantees by RDD in the context of
limit() and collect(). Can someone confirm this?
* sparkContext.parallelize(myList) returns an RDD that may have a different
row order than myList.
* Every RDD loaded with the same file in HDFS (e.g.
sparkContext.textFile(³hdfs://path_to_file²)) will collect rows in the same
order.
* Row order of an RDD is preserved through non-shuffling operations (e.g.
Map, filter).
Mingyu



Mime
View raw message