spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltán Zvara <zoltan.zv...@gmail.com>
Subject Re: Spark remote communication pattern
Date Thu, 09 Apr 2015 14:27:56 GMT
Thanks! I've found the fetcher! Is there any other places and cases where
blocks are traveled through network?

Zvara Zoltán



mail, hangout, skype: zoltan.zvara@gmail.com

mobile, viber: +36203129543

bank: 10918001-00000021-50480008

address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a

elte: HSKSJZ (ZVZOAAI.ELTE)

2015-04-09 10:24 GMT+02:00 Reynold Xin <rxin@databricks.com>:

> Take a look at the following two files:
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala
>
> and
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
>
> On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara <zoltan.zvara@gmail.com>
> wrote:
>
>> Dear Developers,
>>
>> I'm trying to investigate the communication pattern regarding data-flow
>> during execution of a Spark program defined by an RDD chain. I'm
>> investigating from the Task point of view, and found out that the task
>> type
>> ResultTask (as retrieving the iterator for its RDD for a given partition),
>> effectively asks the BlockManager to get the block from local or remote
>> location. What I do there is to include actual location data in
>> BlockResult
>> so the task can tell where it retrieved the data from. I've found out that
>> ResultTask can issue a data-flow only in this case.
>>
>> What's the case with the ShuffleMapTask? What happens there? I'm trying to
>> log locations which are included in the shuffle process. I would be happy
>> to receive a few hints regarding where remote communication is managed in
>> case of ShuffleMapTask.
>>
>> Thanks!
>>
>> Zoltán
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message