beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2708) Decompressing bzip2 files with multiple "streams" only reads the first stream
Date Thu, 03 Aug 2017 18:17:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113225#comment-16113225
] 

ASF GitHub Bot commented on BEAM-2708:
--------------------------------------

GitHub user chamikaramj opened a pull request:

    https://github.com/apache/beam/pull/3681

    [BEAM-2708] Adds support for reading concatenated bzip2 files

    Cherry-picking into 2.1.0 release branch.
    
    Corresponding fix for Java SDK was already cherry picked into 2.1.0 branch. I think it's
good to get the Python SDK fix in as well so that SDKs are consistent.
    
    Adds support for reading concatenated bzip2 files
    
    Adds tests for concatenated gzip and bzip2 files.
    
    Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually
hitting 'DummyReadTransform' and not testing this feature.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/beam bzip2_python_cherrypick

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3681
    
----
commit d6516c69e61f2061005d01a9e36ee1e4137a1478
Author: chamikara@google.com <chamikara@google.com>
Date:   2017-08-03T05:49:33Z

    Adds support for reading concatenated bzip2 files.
    
    Adds tests for concatenated gzip and bzip2 files.
    
    Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually
hitting 'DummyReadTransform' and not testing this feature.

----


> Decompressing bzip2 files with multiple "streams" only reads the first stream
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-2708
>                 URL: https://issues.apache.org/jira/browse/BEAM-2708
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions, sdk-py
>            Reporter: Pablo Estrada
>            Assignee: Chamikara Jayalath
>             Fix For: 2.1.0, 2.2.0
>
>
> I'm not sure which components to file this against. A user has observed that pbzip2 files
are not being properly decompressed:
> https://stackoverflow.com/questions/45439117/google-dataflow-only-partly-uncompressing-files-compressed-with-pbzip2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message