spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darin McBeath <>
Subject repartitionAndSortWithinPartitions and mapPartitions and sort order
Date Fri, 13 Mar 2015 01:11:34 GMT
I am using repartitionAndSortWithinPartitions to partition my content and then sort within
each partition.  I've also created a custom partitioner that I use with repartitionAndSortWithinPartitions.
I created a custom partitioner as my key consist of something like 'groupid|timestamp' and
I only want to partition on the group id but I want to sort the records on each partition
using the entire key (groupid and the timestamp).

My question is when I use mapPartitions (to process the records in each partition) is whether
the order in each partition will be guaranteed (from the sort) as I iterate through the records
in each partition.  As I iterate, while processing the current record I need to look at the
previous record and the next record in the partition and I need to make sure the records would
be processed in the sorted order.

I tend to think so, but wanted to confirm.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message