spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <>
Subject Re: Equally weighted partitions in Spark
Date Thu, 01 May 2014 21:21:06 GMT
The problem is that equally-sized partitions take variable time to complete
based on their contents?

Sent from my mobile phone
On May 1, 2014 8:31 AM, "deenar.toraskar" <> wrote:

> Hi
> I am using Spark to distribute computationally intensive tasks across the
> cluster. Currently I partition my RDD of tasks randomly. There is a large
> variation in how long each of the jobs take to complete, leading to most
> partitions being processed quickly and a couple of partitions take forever
> to complete. I can mitigate this problem by increasing the number of
> partitions to some extent.
> Ideally i would like to partition tasks by complexity (Let's assume I can
> get such a value from the task object) such that each sum of complexity in
> of elements in each partition evenly distributed. Has anyone created such a
> partitioner before?
> Regards
> Deenar
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at

View raw message