commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damjan Jovanovic <damjan....@gmail.com>
Subject Re: [compress] Do we want 7z Archive*Stream-like classes
Date Tue, 01 Oct 2013 07:38:38 GMT
On Tue, Oct 1, 2013 at 6:09 AM, Stefan Bodewig <bodewig@apache.org> wrote:
> On 2013-09-30, Benedikt Ritter wrote:
>
>> 2013/9/30 Stefan Bodewig <bodewig@apache.org>
>
>>> I'm in no way as familiar with the format as Damian is but IMHO it is
>>> feasible - but likely pretty memory hungry.  Even more so for the
>>> writing side.  Similar to zip some information is stored in a central
>>> place but in this case at the front of the archive.
>
>> just out of curiosity: is this memory problem related to Java or to 7z in
>> general?
>
> What Bernd said.
>
> Reading may be simpler, here you can store the meta-information from the
> start of the file in memory and then read entries as you go, ZipFile
> inside the zip package does something like this.

>From what I remember:

The "meta-information" can be anywhere in the file, as can the
compressed files themselves. The 7zip tool seems to write the
meta-information at the end of the 7z file when multi-file archives
are created. Compressed file codecs, positions, lengths, and solid
compression details are only stored in the meta-information, so it's
not possible to write a streaming reader without O(n) memory in the
worst case.

> When you consider writing you'll have to write metadata about all
> entries before you even start to write the first bytes of the first
> entry.  Either you build up everything in memory or you use a temporary
> output.  This is not without precedent in Compress, pack200 allows users
> to chose between two strategies that provide exactly those two options.

Writing also requires seeking or O(n) memory, as the initial header at
the beginning of the file contains the offset to the next header, and
we only know the size/contents/location of the next header once all
the files have been written.

Since we now have multiple archivers that require seeking, I suggest
we add a SeekableStream class or something along those lines. The
Commons Imaging project also has the same problem to solve for images,
and it uses ByteSources, which can be arrays, files, or an InputStream
wrapper that caches what has been read (so seeking is efficient, while
it only reads as much from the InputStream as is necessary).

> Stefan
>

Damjan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message