spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingyu Kim <m...@palantir.com>
Subject Sort order of RDD rows
Date Thu, 03 Oct 2013 22:33:55 GMT
Hi all,

Is the sort order guaranteed if you apply operations like map(), filter() or
distinct() after sort in a distributed setting (run on a cluster of machines
backed by HDFS)? In other words, does rdd.sortByKey().map() have the same
sort order as rdd.sortByKey()? If so, is it documented somewhere which
operations preserve sort order and which don't?

Thanks,
Mingyu



Mime
View raw message