flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1670) Collect method for streaming
Date Wed, 08 Apr 2015 17:56:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485648#comment-14485648

ASF GitHub Bot commented on FLINK-1670:

Github user StephanEwen commented on the pull request:

    It is an interesting idea to collect back a data stream. This solution here has, however,
quite a few limitations and implications (I assume it was only locally tested?):
      - It supports only `java.io.Serializable` types. This is a bit inconsistent with the
current type handling and serialization in Flink. Some types that work in all other parts
do not work here.
      - It does not work in a cluster. It sends "localhost" as the name to the worker who
should send the data back. In any non-local setup, this cannot work.
      - It requires the worker to be able to connect to the client. This may be tricky, when
the client and workers do not run both in the cluster.
      - Selecting the proper interface that opens the port for data communication is actually
quite tricky. The TaskManagers spend quite a bit of work to select that interface - otherwise
many installations do not work, since in most cases certain interfaces or hostnames are only
accessible from certain networks (cloud internal and external network interfaces).
    I think this is a very tricky thing to realize. It has implications on the distributed
process and communication model. It starts extending streaming to mixed local/remote runtimes
and everything. It affects all assumptions we make for fault tolerance. What happens to the
stream in case of a failure? There is no notion of restarting the driver.
    That is something that needs a bit more consideration and design, for the sake of building
something consistent where the concepts and implications play together well. I hope you do
not take it the wrong way, but without clarifying these points, this addition is a bit premature.


> Collect method for streaming
> ----------------------------
>                 Key: FLINK-1670
>                 URL: https://issues.apache.org/jira/browse/FLINK-1670
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 0.9
>            Reporter: Márton Balassi
>            Assignee: Gabor Gevay
>            Priority: Minor
> A convenience method for streaming back the results of a job to the client.
> As the client itself is a bottleneck anyway an easy solution would be to provide a socket
sink with degree of parallelism 1, from which a client utility can read.

This message was sent by Atlassian JIRA

View raw message