beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacob Marble (JIRA)" <>
Subject [jira] [Commented] (BEAM-2500) Add support for S3 as a Apache Beam FileSystem
Date Thu, 14 Sep 2017 18:46:00 GMT


Jacob Marble commented on BEAM-2500:

Multipart upload could get us around the content length requirement, but it's awkward. An
object can be 5TB, and a multipart upload can have 10,000 parts, so I could read 500MB at
a time into memory, ship those chunks. Bad idea.

Still can't see how Beam can indicate content length to a FileSystem sink. I'll move on to
source stuff for a while.

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>                 Key: BEAM-2500
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
>         Attachments: hadoop_fs_patch.patch
> Note that this is for providing direct integration with S3 as an Apache Beam FileSystem.
> There is already support for using the Hadoop S3 connector by depending on the Hadoop
File System module[1], configuring HadoopFileSystemOptions[2] with a S3 configuration[3].
> 1:
> 2:
> 3:

This message was sent by Atlassian JIRA

View raw message