flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Metzger (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem
Date Mon, 03 May 2021 06:23:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338198#comment-17338198

Robert Metzger commented on FLINK-19481:

Hi Ben, thanks a lot for getting back on this ticket. It's great to hear that you have a battle-tested
connector implementation available. It has been brought up a few times that Flink doesn't
ship with a GCS connector.
Before proceeding, we have to figure out one problem: Related to FLINK-11838, there seems
to be a pull request under review (https://github.com/apache/flink/pull/15599), that also
intends to add a GCS file system implementation. In that case, it is mostly about supporting
the StreamingFileSink.
The implementation from PR #15599 seems to go through the Hadoop stack, so I guess it is different
from your implementation, which goes through to the google APIs directly.
Does your implementation support the recoverable writer interface?
I'm not very deep in the filesystems implementations these days, could you take a quick look
for me at the other PR and make a proposal how we can proceed? (join forces?, add two implementations?,
discard one?, ??? )

> Add support for a flink native GCS FileSystem
> ---------------------------------------------
>                 Key: FLINK-19481
>                 URL: https://issues.apache.org/jira/browse/FLINK-19481
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, FileSystems
>    Affects Versions: 1.12.0
>            Reporter: Ben Augarten
>            Priority: Minor
>              Labels: auto-deprioritized-major
> Currently, GCS is supported but only by using the hadoop connector[1]
> The objective of this improvement is to add support for checkpointing to Google Cloud
Storage with the Flink File System,
> This would allow the `gs://` scheme to be used for savepointing and checkpointing. Long
term, it would be nice if we could use the GCS FileSystem as a source and sink in flink jobs
as well. 
> Long term, I hope that implementing a flink native GCS FileSystem will simplify usage
of GCS because the hadoop FileSystem ends up bringing in many unshaded dependencies.
> [1] [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)]

This message was sent by Atlassian Jira

View raw message