spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dachuan <>
Subject Re: Is there any way to set the output location for each partition for the RDD?
Date Fri, 01 Nov 2013 02:53:37 GMT
I guess it could be solved by extending from existing RDD and override the
getPreferredLocations() definition.

But I am not sure, I will wait for the answer.

On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie <> wrote:

> Hi,
> My iterative program written in Spark got quite various running time for
> each iterations, although the computation load is supposed to
> be roughly the same. My program logic would add a batch of tuples and
> delete roughly same number of tuples in each iteration.
> I suspect part of the reason is because the partitions are not allocated
> evenly between the machines. Is there any easy way to fix the output
> location for each partition? (say, each time I create a new RDD with 32
> partitions when running on 4 machines, I would like to fix the first 8
> partitions to the first machine, the second 8 partitions to the second
> machine, etc). I just want to verify whether my assumption is correct. :)
> Thank you!
> Best Regards,
> WEnlei

Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio

View raw message