beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devon Meunier (JIRA)" <>
Subject [jira] [Commented] (BEAM-2150) Support for recursive wildcards in GcsPath
Date Sun, 07 May 2017 01:24:04 GMT


Devon Meunier commented on BEAM-2150:

[] noticed that gsutil's globbing semantics don't quite match my PR.

He noted:

[11:12:18 dhalperi@dhalperi:beam a3cbf5905* ] gsutil ls 'gs://clouddfe-dhalperi/gcs-recursive/**/*.txt'

However that same glob passed to TextIO only gets the second file.

However, testing against a shell also seems to have different semantics:

[I] » tree glob/                                                              ~
├── dir
│   └── file2.txt
└── file1.txt

1 directory, 2 files
[I] » ls glob/**/*.txt                                                        ~
[I] » ls glob/**.txt                                                          ~
glob/dir/file2.txt glob/file1.txt
[I] »                                                                         ~

My PR matches the behaviour of a shell, so gsutil seems like the odd one out. I think we can
commit to it with more tests to make this behaviour explicit. What do you think?

> Support for recursive wildcards in GcsPath
> ------------------------------------------
>                 Key: BEAM-2150
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core, sdk-java-gcp
>            Reporter: Devon Meunier
>            Assignee: Devon Meunier
>            Priority: Minor
> When working with heavily nested folder structures in Google Cloud Storage, it's great
to make use of recursive wildcards, which the current API explicitly does not support.
> This code hasn't been touched in 2 years so it's likely that simply no one's gotten around
to it yet.

This message was sent by Atlassian JIRA

View raw message