beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2828) Create FileIO
Date Fri, 01 Sep 2017 00:41:01 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149850#comment-16149850
] 

ASF GitHub Bot commented on BEAM-2828:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3799

    [BEAM-2828, BEAM-2750] Introduces FileIO.read() and uses it in Text, Avro and Xml

    This is on top of https://github.com/apache/beam/pull/3759.
    
    * Creates a `ReadableFile` type that's just a utility wrapper over a `Metadata` and `Compression.
    * Creates `FileIO.read()` that returns `ReadableFile`'s - this subsumes BEAM-2750.
    * Creates versions `readFiles()` in TextIO and XmlIO that give access to all the features
of `FileIO` to all users of these IOs. For example, XmlIO does not explicitly support watching
for new files, or value providers - but you can get them by combining `FileIO.match`, `FileIO.read`,
and `XmlIO.readFiles`.
    
        /**
         * Like {@link #read}, but reads each file in a {@link PCollection} of {@link ReadableFile},
which
         * allows more flexible usage via different configuration options of {@link FileIO#match}
and
         * {@link FileIO#readMatches} that are not explicitly provided for {@link #read}.
         *
         * <p>For example:
         *
         * <pre>{@code
         * PCollection<ReadableFile> files = p
         *     .apply(FileIO.match().filepattern(options.getInputFilepatternProvider()).continuously(
         *       Duration.standardSeconds(30), afterTimeSinceNewOutput(Duration.standardMinutes(5))))
         *     .apply(FileIO.readMatches().withCompression(GZIP));
         *
         * PCollection<String> output = files.apply(XmlIO.<Record>readFiles()
         *     .withRootElement("root")
         *     .withRecordElement("record")
         *     .withRecordClass(Record.class));
         * }</pre>
         */
    
    R: @reuvenlax

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam readable-file

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3799.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3799
    
----
commit a9e3e82cadb5e92b430c632ca1503a45eaa2da6d
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-08-24T23:31:41Z

    Moves Match into FileIO.match()/matchAll()
    
    FileIO will later gain other methods, such as read()/write().
    
    Also introduces FileIO.MatchConfiguration - a common type to use
    by various file-based IOs to reduce boilerplate, and uses it in TextIO.

commit ead6af9bf3e2bbe2f53d4e75a4bf0e1a40a92b31
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-08-31T23:11:25Z

    Introduces FileIO.read()

commit b36b679682e8dcfa6106083069d83a874fac7228
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-08-31T23:28:07Z

    Uses FileIO.read() in TextIO and AvroIO

commit c38482c102871399eb50551a45b7b79ab8e8fc6e
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-08-31T23:43:22Z

    Introduces TextIO.readFiles()

commit 3b17c41730a43557d000b6c4662e6572d97fdcd7
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-09-01T00:21:20Z

    Introduces XmlIO.readFiles

----


> Create FileIO
> -------------
>
>                 Key: BEAM-2828
>                 URL: https://issues.apache.org/jira/browse/BEAM-2828
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>             Fix For: 2.2.0
>
>
> Let's have FileIO as a namespace for transforms such as: current Match.filepatterns();
FileIO.read() for reading whole files and FileIO.write() for writing whole files, etc.
> Target for 2.2.0 is just creating the namespace and moving Match.filepatterns() into
it (https://github.com/apache/beam/pull/3759).
> Related JIRAs: https://issues.apache.org/jira/browse/BEAM-2750 and https://issues.apache.org/jira/browse/BEAM-2751



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message