spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <>
Subject Re: Eliminate copy while sending data : any Akka experts here ?
Date Wed, 02 Jul 2014 09:12:31 GMT
Hi Patrick,

  Please see inline.


On Wed, Jul 2, 2014 at 10:52 AM, Patrick Wendell <> wrote:
>> b) Instead of pulling this information, push it to executors as part
>> of task submission. (What Patrick mentioned ?)
>> (1) a.1 from above is still an issue for this.
> I don't understand problem a.1 is. In this case, we don't need to do
> caching, right?

To rephrase in this context, attempting to cache wont help since it is
reducer specific and benefits are minimal (other than for reexecution
for failures and speculative tasks).

>> (2) Serialized task size is also a concern : we have already seen
>> users hitting akka limits for task size - this will be an additional
>> vector which might exacerbate it.
> This would add only a small, constant amount of data to the task. It's
> strictly better than before. Before if the map output status array was
> size M x R, we send a single akka message to every node of size M x
> R... this basically scales quadratically with the size of the RDD. The
> new approach is constant... it's much better. And the total amount of
> data send over the wire is likely much less.

It would be a function of the number of mappers - and an overhead for each task.


> - Patrick

View raw message