beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-2828) Create FileIO
Date Fri, 01 Sep 2017 00:41:01 GMT


ASF GitHub Bot commented on BEAM-2828:

GitHub user jkff opened a pull request:

    [BEAM-2828, BEAM-2750] Introduces and uses it in Text, Avro and Xml

    This is on top of
    * Creates a `ReadableFile` type that's just a utility wrapper over a `Metadata` and `Compression.
    * Creates `` that returns `ReadableFile`'s - this subsumes BEAM-2750.
    * Creates versions `readFiles()` in TextIO and XmlIO that give access to all the features
of `FileIO` to all users of these IOs. For example, XmlIO does not explicitly support watching
for new files, or value providers - but you can get them by combining `FileIO.match`, ``,
and `XmlIO.readFiles`.
         * Like {@link #read}, but reads each file in a {@link PCollection} of {@link ReadableFile},
         * allows more flexible usage via different configuration options of {@link FileIO#match}
         * {@link FileIO#readMatches} that are not explicitly provided for {@link #read}.
         * <p>For example:
         * <pre>{@code
         * PCollection<ReadableFile> files = p
         *     .apply(FileIO.match().filepattern(options.getInputFilepatternProvider()).continuously(
         *       Duration.standardSeconds(30), afterTimeSinceNewOutput(Duration.standardMinutes(5))))
         *     .apply(FileIO.readMatches().withCompression(GZIP));
         * PCollection<String> output = files.apply(XmlIO.<Record>readFiles()
         *     .withRootElement("root")
         *     .withRecordElement("record")
         *     .withRecordClass(Record.class));
         * }</pre>
    R: @reuvenlax

You can merge this pull request into a Git repository by running:

    $ git pull readable-file

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3799
commit a9e3e82cadb5e92b430c632ca1503a45eaa2da6d
Author: Eugene Kirpichov <>
Date:   2017-08-24T23:31:41Z

    Moves Match into FileIO.match()/matchAll()
    FileIO will later gain other methods, such as read()/write().
    Also introduces FileIO.MatchConfiguration - a common type to use
    by various file-based IOs to reduce boilerplate, and uses it in TextIO.

commit ead6af9bf3e2bbe2f53d4e75a4bf0e1a40a92b31
Author: Eugene Kirpichov <>
Date:   2017-08-31T23:11:25Z


commit b36b679682e8dcfa6106083069d83a874fac7228
Author: Eugene Kirpichov <>
Date:   2017-08-31T23:28:07Z

    Uses in TextIO and AvroIO

commit c38482c102871399eb50551a45b7b79ab8e8fc6e
Author: Eugene Kirpichov <>
Date:   2017-08-31T23:43:22Z

    Introduces TextIO.readFiles()

commit 3b17c41730a43557d000b6c4662e6572d97fdcd7
Author: Eugene Kirpichov <>
Date:   2017-09-01T00:21:20Z

    Introduces XmlIO.readFiles


> Create FileIO
> -------------
>                 Key: BEAM-2828
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>             Fix For: 2.2.0
> Let's have FileIO as a namespace for transforms such as: current Match.filepatterns(); for reading whole files and FileIO.write() for writing whole files, etc.
> Target for 2.2.0 is just creating the namespace and moving Match.filepatterns() into
it (
> Related JIRAs: and

This message was sent by Atlassian JIRA

View raw message