spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anastasios Zouzias <zouz...@gmail.com>
Subject Re: Equally split a RDD partition into two partition at the same node
Date Sun, 15 Jan 2017 08:58:36 GMT
Hi Fei,

How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?

https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395

coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a narrow dependency (satisfying your
requirements) if you increase the # of partitions.

Best,
Anastasios

On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <hufei68@gmail.com> wrote:

> Dear all,
>
> I want to equally divide a RDD partition into two partitions. That means,
> the first half of elements in the partition will create a new partition,
> and the second half of elements in the partition will generate another new
> partition. But the two new partitions are required to be at the same node
> with their parent partition, which can help get high data locality.
>
> Is there anyone who knows how to implement it or any hints for it?
>
> Thanks in advance,
> Fei
>
>


-- 
-- Anastasios Zouzias
<azo@zurich.ibm.com>

Mime
View raw message