commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject [compress] Detecting LZMA standalone files
Date Mon, 10 Jun 2013 04:25:28 GMT
Hi,

when I added support for decompressing .lzma files I left out matches()
and you can only get an LZMACompressorInputStream from
CompressorStreamFactory if you use the version that explicitly specifies
the format.

The reason is that the old .lzma format doesn't have any sort of
signature at all.  I've been told that if you'd try to "unlzma" a plain
text file the most likely outcome is an OutOfMemoryError.

The native XZUtil which is used for xz as well as lzma contains some
heuristic that allows the xz command to guess the input format.  It
first checks whether the input is xz and if not whether the settings
that would make up the start of an LZMA stream don't look to strange.

We could do something similar by placing the LZMA check at the end in
the CompressorStreamFactory's autodetect method and perform the same
plausibility checks on the input.  This would still run the risk of
false positives and - maybe less likely - false negatives.  Do we want
to do something like this?

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message