spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <>
Subject Re: Increase partition count (repartition) without shuffle
Date Thu, 18 Jun 2015 21:36:10 GMT
Hi Alexander,

There is currently no way to create an RDD with more partitions than its
parent RDD without causing a shuffle.

However, if the files are splittable, you can set the Hadoop configurations
that control split size to something smaller so that the HadoopRDD ends up
with more partitions.


On Thu, Jun 18, 2015 at 2:26 PM, Ulanov, Alexander <>

>  Hi,
> Is there a way to increase the amount of partition of RDD without causing
> shuffle? I’ve found JIRA issue
> however there is no
> implementation yet.
> Just in case, I am reading data from ~300 big binary files, which results
> in 300 partitions, then I need to sort my RDD, but it crashes with
> outofmemory exception. If I change the number of partitions to 2000, sort
> works OK, but repartition itself takes a lot of time due to shuffle.
> Best regards, Alexander

View raw message