commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lasse Collin <>
Subject Re: [compress] XZ support and inconsistencies in the existing compressors
Date Thu, 04 Aug 2011 18:00:35 GMT
On 2011-08-04 Stefan Bodewig wrote:
> On 2011-08-04, Lasse Collin wrote:
> > Using bits from the end of stream magic doesn't make sense, because
> > then one would be forced to finish the stream. Using the bits from
> > the block header magic means that one must add at least one more
> > block. This is fine if the application will want to encode at least
> > one more byte. If the application calls close() right after
> > flushing, then there's a problem unless .bz2 format allows empty
> > blocks. I get a feeling from the code that .bz2 would support empty
> > blocks, but I'm not sure at all.
> It should be possible to write some unit tests to see what works and
> to create some test archives for interop testing with native tools.

Maybe, if it is possible to even create such files.

Making flush() equivalent to finish() (except that one can continue
after flush()) with bzip2 sounds much lazier and safer, even if it can
create its own problems too.

> >>> (4) The decompressor streams don't support concatenated .gz
> >>> and .bz2 files. This can be OK when compressed data is used inside
> >>>     another file format or protocol, but with regular
> >>>     (standalone) .gz and .bz2 files it is bad to stop after the
> >>>     first compressed stream and silently ignore the remaining
> >>>     compressed data.
> >>>     Fixing this in BZip2CompressorInputStream should be relatively
> >>>     easy because it stops right after the last byte of the
> >>>     compressed stream.
> >> Is this <>?
> > Yes. I didn't check the suggested fix though.
> Would be nice if you'd find the time to do so.

It uses in.available() == 0. It duplicates the test for "BZh" magic
bytes and a little more from init() into complete(). I think this bug
can be fixed in a nicer way.

Is there a need to have a bzip2 decompressor that does stop after the
first stream (like the current code does)? Maybe .zip needs it?

> We'll need standalone compressors for other formats as well (and we do
> need LZMA 8-).  Some of the options your code provides might be
> interesting for the ZIP package as well when we want to implement some
> of the other supported methods.

The .lzma format is legacy. While it may have some uses, people should
usually move to .xz and LZMA2.

The .zip format has LZMA marked as "Early Feature Specification". Minor
details are a little bit weird. For example, it requires storing the
LZMA SDK version that was used for compression (what if you don't use
unmodified LZMA SDK).

What else needs LZMA? Do you plan .7z support?

> If you need help with publishing your package to a Maven repository -
> some of your users will ask for it sooner or later - I know where to
> find people who can help.


Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message