spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "deenar.toraskar" <>
Subject Equally weighted partitions in Spark
Date Thu, 01 May 2014 15:30:37 GMT

I am using Spark to distribute computationally intensive tasks across the
cluster. Currently I partition my RDD of tasks randomly. There is a large
variation in how long each of the jobs take to complete, leading to most
partitions being processed quickly and a couple of partitions take forever
to complete. I can mitigate this problem by increasing the number of
partitions to some extent.

Ideally i would like to partition tasks by complexity (Let's assume I can
get such a value from the task object) such that each sum of complexity in
of elements in each partition evenly distributed. Has anyone created such a
partitioner before? 


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message