flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1038) Adding a collection output format
Date Thu, 14 Aug 2014 14:26:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097001#comment-14097001

ASF GitHub Bot commented on FLINK-1038:

GitHub user fatschi opened a pull request:


    Remote collector output format

    This is a proposal for https://issues.apache.org/jira/browse/FLINK-1038

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fatschi/incubator-flink RemoteCollectorOutputFormat

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #94
commit 026d0c6548db48c29ad92e7aa5c032e3b0339ccd
Author: Fabian Tschirschnitz <fatschi@googlemail.com>
Date:   2014-08-14T13:01:34Z

    initial commit of RemoteCollectorOutputFormat, see issue https://issues.apache.org/jira/browse/FLINK-1038

commit a7842fdf8a8481ccf5f265e94bcd26e4e40fb136
Author: Fabian Tschirschnitz <fatschi@googlemail.com>
Date:   2014-08-14T13:37:40Z

    refactored RemoteCollectorOutputFormat and added some JavaDoc

commit 7e58b8d0a24593df1ac60ef7e64ffa69ff3fcced
Author: Fabian Tschirschnitz <fatschi@googlemail.com>
Date:   2014-08-14T14:19:34Z

    minor refactorings in RemoteColllectorOutputFormat


> Adding a collection output format
> ---------------------------------
>                 Key: FLINK-1038
>                 URL: https://issues.apache.org/jira/browse/FLINK-1038
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Sebastian Kruse
>            Priority: Minor
> Similar to the existing LocalCollectionOutputFormat or Spark's collect() method, it would
be nice to have a CollectionOutputFormat that also works when running jobs on a cluster. This
output format gathers all results of a sink from all TaskManagers in the JVM that submitted
the job plan and provides these as a collection, similar to accumulators. After all, this
can help to avoid the tedious task of going to HDFS and read and parse the single result files.
> PS. We have already created such an output format and can contribute it.

This message was sent by Atlassian JIRA

View raw message