flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gyula Fora (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3659) Add ConnectWithBroadcast Operation
Date Tue, 01 Nov 2016 10:39:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625086#comment-15625086

Gyula Fora commented on FLINK-3659:

Would this work in a way as connected streams work now but without the partitioning limitation
and the Value state access would be "blocked" by some flag when we are in the method for processing
the broadcast input. Or do we assume the broadcast state to be the same on all nodes for checkpointing?

> Add ConnectWithBroadcast Operation
> ----------------------------------
>                 Key: FLINK-3659
>                 URL: https://issues.apache.org/jira/browse/FLINK-3659
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
> We should add a new operation that has a main input that can be keyed (but doesn't have
to be) and a second input that is always broadcast. This is similar to a {{CoFlatMap}} or
{{CoMap}} but there either both inputs have to be keyed or non-keyed.
> This builds on FLINK-4940 which aims at adding broadcast/global state. When processing
an element from the broadcast input only access to broadcast state is allowed. When processing
an element from the main input access both the regular keyed state and the broadcast state
can be accessed.
> I'm proposing this as an intermediate/low-level operation because it will probably take
a while until we add support for side-inputs in the API. This new operation would allow expressing
new patterns that cannot be expressed with the currently expressed operations.
> This is the new proposed API (names are non-final): 
> 1) Add {{DataStream.connectWithBroadcast(DataStream)}} and {{KeyedStream.connectWithBroadcast(DataStream)}}
> 2) Add {{ConnectedWithBroadcastStream}}, akin to {{ConnectedStreams}}/
> 3) Add {{BroadcastFlatMap}} and {{TimelyBroadcastFlatMap}} as the user functions.
> The API names, function names are a bit verbose and we have to add two new different
ones but I don't see a way around this with the current way the Flink API works.

This message was sent by Atlassian JIRA

View raw message