spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <>
Subject Re: Spark Streaming Data flow graph
Date Mon, 05 Jan 2015 21:57:14 GMT
Hey François,

Well, at a high-level here is what I thought about the diagram.

- ReceiverSupervisor handles only one Receiver.
- BlockGenerator is part of ReceiverSupervisor not ReceivedBlockHandler
- The blocks are inserted in BlockManager and if activated,
WriteAheadLogManager in parallel, not through BlockManager as the
diagram seems to imply
- It would be good to have a clean visual separation of what runs in
Executor (better term than Worker) and what is in Driver ... Driver
stuff on left and Executor stuff on right, or vice versa.

More importantly, the word of caution is that all the internal stuff
like ReceiverBlockHandler, Supervisor, etc are subject to change any
time as we keep refactoring stuff. So highlighting these internal
details too much too publicly may lead to future confusion.


On Thu, Dec 18, 2014 at 11:04 AM,  <> wrote:
> I’ve been trying to produce an updated box diagram to refresh :
> … after the SPARK-3129, and other switches (a surprising number of comments still mention
> Here’s what I have so far:
> This is not supposed to respect any particular convention (ER, ORM, …). Data flow up
to right before RDD creation is in bold arrows, metadata flow is in normal width arrows.
> This diagram is still very much a WIP (see below : todo), but I wanted to share it to
> - what’s wrong ?
> - what are the glaring omissions ?
> - how can I make this better (i.e. what should I add first to the Todo-list below) ?
> I’ll be happy to share this (including sources) with whoever asks for it.
> Todo :
> - mark private/public classes
> - mark queues in Receiver, ReceivedBlockHandler, BlockManager
> - mark type of info on transport : e.g. Actor message, ReceivedBlockInfo
> —
> François Garillot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message