flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1081) Add HDFS file-stream source for streaming
Date Thu, 04 Dec 2014 21:47:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234679#comment-14234679
] 

ASF GitHub Bot commented on FLINK-1081:
---------------------------------------

Github user gyfora commented on the pull request:

    https://github.com/apache/incubator-flink/pull/226#issuecomment-65709404
  
    You are right Robert. This behavior is unexpected at best, and we will have to do something
about it. It actually applies to other sources as well. A central monitor would be ideal until
then we could figure out some workaround. The first thing that came to my mind is to somehow
partition the incoming files in the sources for example hash the file names. We should of
course try to respect locality for performance.


> Add HDFS file-stream source for streaming
> -----------------------------------------
>
>                 Key: FLINK-1081
>                 URL: https://issues.apache.org/jira/browse/FLINK-1081
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.7.0-incubating
>            Reporter: Gyula Fora
>            Assignee: Chiwan Park
>              Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or other filesystems
as well) and will process all new files created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message