beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Chambers (JIRA)" <>
Subject [jira] [Commented] (BEAM-2708) Support for pbzip2 in IO
Date Tue, 01 Aug 2017 20:33:00 GMT


Ben Chambers commented on BEAM-2708:

This looks to be a bug in the CompressedSource support for BZIP2. Specifically, we create
the stream with:

        return Channels.newChannel(
            new BZip2CompressorInputStream(Channels.newInputStream(channel)));

Which defaults to `decompressConcatenated = false`. As a result only the first "stream" within
the `bz2` file is actually read.

The fix is easy -- change that code to:

        return Channels.newChannel(
            new BZip2CompressorInputStream(Channels.newInputStream(channel), true));

But coming up with a test is a bit harder.

> Support for pbzip2 in IO
> ------------------------
>                 Key: BEAM-2708
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions, sdk-py
>            Reporter: Pablo Estrada
>            Assignee: Ben Chambers
> I'm not sure which components to file this against. A user has observed that pbzip2 files
are not being properly decompressed:

This message was sent by Atlassian JIRA

View raw message