spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ulanov, Alexander" <>
Subject Increase partition count (repartition) without shuffle
Date Thu, 18 Jun 2015 21:26:00 GMT

Is there a way to increase the amount of partition of RDD without causing shuffle? I've found
JIRA issue however there is no implementation

Just in case, I am reading data from ~300 big binary files, which results in 300 partitions,
then I need to sort my RDD, but it crashes with outofmemory exception. If I change the number
of partitions to 2000, sort works OK, but repartition itself takes a lot of time due to shuffle.

Best regards, Alexander

View raw message