flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1081) Add HDFS file-stream source for streaming
Date Mon, 01 Dec 2014 13:02:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229754#comment-14229754

ASF GitHub Bot commented on FLINK-1081:

Github user rmetzger commented on the pull request:

    I think its okay to do it this way. I forgot that the FSDataInputStreamWrapper of Avro
is implementing an Avro specific interface.
    Sorry that I'm asking so many questions. Your code is very well written, I'm asking these
questions to get a better understanding of your changes.
    Have you tested the code on a cluster with HDFS? I was wondering what happens if you are
running the code in a distributed setup, with the sources running multiple times in the cluster?
    If the sources are running `n` times in the cluster, all `n` instances will "see" the
new or updated file and then start to process it.

> Add HDFS file-stream source for streaming
> -----------------------------------------
>                 Key: FLINK-1081
>                 URL: https://issues.apache.org/jira/browse/FLINK-1081
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.7.0-incubating
>            Reporter: Gyula Fora
>            Assignee: Chiwan Park
>              Labels: starter
> Add data stream source that will monitor a slected directory on HDFS (or other filesystems
as well) and will process all new files created.

This message was sent by Atlassian JIRA

View raw message