spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Eliminate copy while sending data : any Akka experts here ?
Date Thu, 03 Jul 2014 22:13:28 GMT
Note that in my original proposal, I was suggesting we could track whether
block size = 0 using a compressed bitmap. That way we can still avoid
requests for zero-sized blocks.



On Thu, Jul 3, 2014 at 3:12 PM, Reynold Xin <rxin@databricks.com> wrote:

> Yes, that number is likely == 0 in any real workload ...
>
>
> On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan <mridul@gmail.com>
> wrote:
>
>> On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin <rxin@databricks.com> wrote:
>> > On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan <mridul@gmail.com>
>> > wrote:
>> >
>> >>
>> >> >
>> >> > The other thing we do need is the location of blocks. This is
>> actually
>> >> just
>> >> > O(n) because we just need to know where the map was run.
>> >>
>> >> For well partitioned data, wont this not involve a lot of unwanted
>> >> requests to nodes which are not hosting data for a reducer (and lack
>> >> of ability to throttle).
>> >>
>> >
>> > Was that a question? (I'm guessing it is). What do you mean exactly?
>>
>>
>> I was not sure if I understood the proposal correctly - hence the
>> query : if I understood it right - the number of wasted requests goes
>> up by num_reducers * avg_nodes_not_hosting data.
>>
>> Ofcourse, if avg_nodes_not_hosting data == 0, then we are fine !
>>
>> Regards,
>> Mridul
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message