beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-1494) GcsFileSystem should check content encoding when setting IsReadSeekEfficient
Date Tue, 09 May 2017 19:41:04 GMT


ASF GitHub Bot commented on BEAM-1494:

GitHub user dhalperi opened a pull request:

    [BEAM-1494] Correctly handle content-encoding in GcsFileSystem, fixing reading of such
files in CompressedSource

    R: @jkff  thoughts?
    CC: @chamikaramj

You can merge this pull request into a Git repository by running:

    $ git pull b1494-gcs-content-encoding

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2998
commit 7ef0f8afc88b292724228fb3507e6d0c77c0b1aa
Author: Dan Halperin <>
Date:   2017-05-09T19:34:04Z

    FileBasedSource: isSplittable should not throw
    This is a legacy design from Dataflow 1.x that was a poor choice.
    All the information needed to know whether a source is splittable should
    be known at source construction time, and if runtime behavior is needed
    it should result in conservative choices, aka false.

commit 59e8e0ec27dfc498dacaaf425548681ed07a2d31
Author: Dan Halperin <>
Date:   2017-05-09T19:36:10Z

    CompressedSource: only use delegate reader if the file is splittable
    Otherwise, it's likely compressed

commit b71f5dfed5b8e56dd01cca5a71e2fa72233ab363
Author: Dan Halperin <>
Date:   2017-05-09T19:36:53Z

    GcsFileSystem: mark content-encoded files as not seekable
    That is the truth (since they are actually compressed) and will result in correct data
    when reading from them in, e.g., TextIO


> GcsFileSystem should check content encoding when setting IsReadSeekEfficient
> ----------------------------------------------------------------------------
>                 Key: BEAM-1494
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Pei He
>            Assignee: Daniel Halperin
> It is incorrect to set IsReadSeekEfficient true for files with content encoding set to
gzip. This is an inherited issue from GcsIOChannelFactory.

This message was sent by Atlassian JIRA

View raw message