flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns
Date Fri, 24 Jun 2016 12:24:16 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348193#comment-15348193
] 

ASF GitHub Bot commented on FLINK-3677:
---------------------------------------

Github user kl0u commented on the issue:

    https://github.com/apache/flink/pull/2109
  
    Hello @mushketyk  and sorry for the late response. 
    
    Great that you are working on that also for the Batch API!
    Recently we introduced in the Streaming API (not batch) the notion of continuous file
monitoring. In this context we also added the ```FileParhFilter``` class. As an example you
can see the ``` readFile(FileInputFormat<OUT> inputFormat, String filePath, FileProcessingMode
watchType, long interval, FilePathFilter filter) ``` in the ```StreamExecutionEnvironment```.
    
    What I would suggest in order to have this functionality for both batch and streaming,
is to remove it from a parameter in the configuration file, and pass the ```FilePathFilter```
as an argument to the constructor of the ```FileInputFormat``` and then do the filtering the
same way you do it. The reason is:
    
    1) Cleaner code, as we will not have 2 different ways to do the same thing
    2) Better usability, as you can imagine a scenario where an administrator sets a global
path filter and then the user another one. In this case, which should be respected?
    3) Overloading the configuration file with job specific stuff is probably not the best
way to go.
    
    This may also require some changes in the internal implementation of the readFile in the
Streaming API, although I am not 100% sure.
    
    Thanks for the PR!


> FileInputFormat: Allow to specify include/exclude file name patterns
> --------------------------------------------------------------------
>
>                 Key: FLINK-3677
>                 URL: https://issues.apache.org/jira/browse/FLINK-3677
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Maximilian Michels
>            Assignee: Ivan Mushketyk
>            Priority: Minor
>              Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message