Hi Fei,

How you triedĀ coalesce(numPartitions: Int, shuffle: Boolean = false) ?

https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395

coalesce is mostly used for reducing the number of partitions before writing to HDFS, but it might still be a narrow dependency (satisfying your requirements) if you increase the # of partitions.

Best,
Anastasios

On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <hufei68@gmail.com> wrote:
Dear all,

I want to equally divideĀ a RDD partition into two partitions. That means, the first half of elements in the partition will create a new partition, and the second half of elements in the partition will generate another new partition. But the two new partitions are required to be at the same node with their parent partition, which can help get high data locality.

Is there anyone who knows how to implement it or any hints for it?

Thanks in advance,
Fei




--
-- Anastasios Zouzias