flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1081) Add HDFS file-stream source for streaming
Date Thu, 04 Dec 2014 21:23:14 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234640#comment-14234640
] 

ASF GitHub Bot commented on FLINK-1081:
---------------------------------------

Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/226#issuecomment-65705999
  
    I'm really not an expert in streaming systems, but the behavior feels unexpected coming
from the batch world.
    For Flink batch, we have an InputSplitAssigner running in the JobManager that is assigning
parts of the input to the workers in the cluster. Maybe we need to run the "HDFS monitoring"
centrally in the JobManager an assign the reading to of the newly added files to individual
workers (ideally respecting the locality).
    
    Maybe a good approach for now is to merge this change, document the limitation and then
fix the issue.


> Add HDFS file-stream source for streaming
> -----------------------------------------
>
>                 Key: FLINK-1081
>                 URL: https://issues.apache.org/jira/browse/FLINK-1081
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.7.0-incubating
>            Reporter: Gyula Fora
>            Assignee: Chiwan Park
>              Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or other filesystems
as well) and will process all new files created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message