spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Night Wolf <nightwolf...@gmail.com>
Subject Cached RDD is not evenly split between executors
Date Thu, 07 May 2015 12:16:18 GMT
Hi guys,

I'm trying to cache an RDD in memory as serialised. Yet it seems the blocks
are not being evenly spread over all the executor nodes.

How can I force Spark to rebalance the rdd? The RDD has been hash
partitioned and I've tried giving it a large number of partitions (1000s).
The hash partition is a long, there are about 6million distinct longs. Its
a large dataset.

Cheers!

[image: Inline image 1]

Mime
View raw message