spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <>
Subject [jira] [Commented] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager
Date Sat, 30 Aug 2014 05:59:54 GMT


Reynold Xin commented on SPARK-2288:

Thanks for the design doc, Raymond. Next time it would be better to also comment on the new
block type you are adding. Cheers.

> Hide ShuffleBlockManager behind ShuffleManager
> ----------------------------------------------
>                 Key: SPARK-2288
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Block Manager, Shuffle
>            Reporter: Raymond Liu
>            Assignee: Raymond Liu
>         Attachments: shuffleblockmanager.pdf
> This is a sub task for SPARK-2275. 
> At present, In shuffle write path, the shuffle block manager manage the mapping from
some blockID to a FileSegment for the benefit of consolidate shuffle, this way it bypass the
block store's blockId based access mode. Then in the read path, when read a shuffle block
data, disk store query shuffleBlockManager to hack the normal blockId to file mapping in order
to correctly read data from file. This really rend to a lot of bi-directional dependencies
between modules and the code logic is some how messed up. None of the shuffle block manager
and blockManager/Disk Store fully control the read path. They are tightly coupled in low level
code modules. And it make it hard to implement other shuffle manager logics. e.g. a sort based
shuffle which might merge all output from one map partition to a single file. This will need
to hack more into the diskStore/diskBlockManager etc to find out the right data to be read.
> Possible approaching:
> So I think it might be better that we expose an FileSegment based read interface for
DiskStore in addition to the current blockID based interface.
> Then those mapping blockId to FileSegment code logic can all reside in the specific shuffle
manager, if they do need to merge data into one single object. they take care of the mapping
logic in both read/write path and take the responsibility of read / write shuffle data
> The BlockStore itself should just take care of read/write as required, it should not
involve into the data mapping logic at all. This might make the interface between modules
more clear and decouple each other in a more clean way.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message