beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Findlay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist
Date Wed, 21 Dec 2016 00:52:58 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765711#comment-15765711
] 

Paul Findlay commented on BEAM-1190:
------------------------------------

[~dhalperi@google.com] Correct me if I'm wrong.. but isn't FileBasedSource.createReader basically
already doing a stat for each file in the expanded list but swallowing the error if there
is one, and leaving it for startImpl to blow up? We are just asking for the method to not
be final so we can treat the different sub-classes of IOException appropriately (for our pipeline).

But would love to know if there is scary behaviour we haven't considered.

> FileBasedSource should ignore files that matched the glob but don't exist
> -------------------------------------------------------------------------
>
>                 Key: BEAM-1190
>                 URL: https://issues.apache.org/jira/browse/BEAM-1190
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat every file
and remove those that don't exist, to account for the possibility that glob yielded non-existing
files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message