flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3637) Change RollingSink Writer interface to allow wider range of outputs
Date Tue, 05 Apr 2016 16:33:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226587#comment-15226587

ASF GitHub Bot commented on FLINK-3637:

Github user aljoscha commented on the pull request:

    The changes look good. One thing I would like to have changed is to rename `SimpleWriterBase`
to `StreamWriterBase` or `StreamWriter` based to reflect the fact that it is used for Stream
based writers.

> Change RollingSink Writer interface to allow wider range of outputs
> -------------------------------------------------------------------
>                 Key: FLINK-3637
>                 URL: https://issues.apache.org/jira/browse/FLINK-3637
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming Connectors
>            Reporter: Lasse Dalegaard
>            Assignee: Lasse Dalegaard
>              Labels: features
> Currently the RollingSink Writer interface only works with FSDataOutputStreams, which
precludes it from being used with some existing libraries like Apache ORC and Parquet.
> To fix this, a new Writer interface can be created, which receives FileSystem and Path
objects, instead of FSDataOutputStream.
> To ensure exactly-once semantics, the Writer interface must also be extended so that
the current write-offset can be retrieved at checkpointing time. For formats like ORC this
requires a footer to be written, before the offset is returned. Checkpointing already calls
flush on the writer, but either flush needs to return the current length of the output file,
or alternatively a new method has to be added for this.
> The existing Writer interface can be recreated with a wrapper on top of the new Writer
interface. The existing code that manages the FSDataOutputStream can then be moved into this
new wrapper.

This message was sent by Atlassian JIRA

View raw message