spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Sort order of RDD rows
Date Fri, 04 Oct 2013 01:00:25 GMT
Yes, it is for these map-like operations. The only time when it isn't is when you change the
RDD's partitioner, e.g. by doing sortByKey or groupByKey. It would definitely be good to document
this more formally.

Matei

On Oct 3, 2013, at 3:33 PM, Mingyu Kim <mkim@palantir.com> wrote:

> Hi all,
> 
> Is the sort order guaranteed if you apply operations like map(), filter() or distinct()
after sort in a distributed setting (run on a cluster of machines backed by HDFS)? In other
words, does rdd.sortByKey().map() have the same sort order as rdd.sortByKey()? If so, is it
documented somewhere which operations preserve sort order and which don't?
> 
> Thanks,
> Mingyu


Mime
View raw message