spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Sorted partition ranges without overlap
Date Mon, 13 Mar 2017 13:34:10 GMT
Hi

I have a RDD<byte[]> that needs to be sorted lexicographically and
then processed by partition. The partitions should be split in to
ranged blocks where sorted order is maintained and each partition
containing sequential, non-overlapping keys.

Given keys (1,2,3,4,5,6)

1. Correct
  - 2 partition = (1,2,3),(4,5,6).
  - 3 partition = (1,2),(3,4),(5,6)

2. Incorrect, the ranges overlap even though they're sorted.
  - 2 partitions (1,3,5) (2,4,6)
  - 3 partitions (1,3),(2,5),(4,6)


Is this possible with spark?

Cheers,
-Kristoffer

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message