spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anastasios Zouzias <>
Subject Re: Equally split a RDD partition into two partition at the same node
Date Sun, 15 Jan 2017 08:58:36 GMT
Hi Fei,

How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?

coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a narrow dependency (satisfying your
requirements) if you increase the # of partitions.


On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <> wrote:

> Dear all,
> I want to equally divide a RDD partition into two partitions. That means,
> the first half of elements in the partition will create a new partition,
> and the second half of elements in the partition will generate another new
> partition. But the two new partitions are required to be at the same node
> with their parent partition, which can help get high data locality.
> Is there anyone who knows how to implement it or any hints for it?
> Thanks in advance,
> Fei

-- Anastasios Zouzias

View raw message