flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Pointers about internal threads and communication in Flink (streaming)
Date Mon, 17 Aug 2015 14:26:32 GMT
Hi Vincenzo,
regarding TaskManagers and how they execute the operations:

The TaskManager gets a class that is derived from AbstractInvokable. The
TaskManager will create an object from that class and then call methods to
facilitate execution. The two main methods are registerInputOutput() and
invoke(). The first allows the invokable to setup the input/output channels
and do initialization work. Then, invoke is called which would contain the
actual loop that keeps reading from inputs and forwards data to the
operator implementation.

The base invokable for streaming is StreamTask. Then there are concrete
subclasses OneInputStreamTask and TwoInputStreamTask for these two basic
types of operator. The actual logic for an operator such as Map or Reduce
is implemented in a subclass of StreamOperator (with concrete
OneInputStreamOperator and TwoInputStreamOperator). OneInputStreamOperator,
for example, has a method processElement(StreamRecord) that must be called
for each element that is received.

The StreamOperator, in turn, would hold the user code function object and
forward received elements to it.

To conclude, the StreamTask does the raw reading from network inputs. The
StreamOperator receives elements and forwards them to user functions based
on the semantics of the operator.

I hope this helps, let us know if you have any more questions about this. :D


On Mon, 17 Aug 2015 at 16:08 Stephan Ewen <sewen@apache.org> wrote:

> Hi!
> We are working on more docs for that. Here is a start that has a section
> about the TaskManager task execution.
> Until then, here is a bit from our wiki:
> Data Exchange:
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> Serialization for Data Exchange:
> https://cwiki.apache.org/confluence/display/FLINK/Type+System%2C+Type+Extraction%2C+Serialization
> Coordiation with Actors:
> https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
> Some WIP documentation on the Task execution:
>   -
> https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/fig/taskmanager_task.svg
>   -
> https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/through_stack.md
> Greetings,
> Stephan
> On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <
> vincenzo.gulisano@gmail.com> wrote:
>> Hi, is there any document describing how streaming operators are run by
>> the TaskManagers and how communication (intra-node and inter-node) is
>> managed. The closest documention I found is
>> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html
>> but it is still pretty high-level.
>> Thank you for your help

View raw message