tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [COMPRESS] zip-bomb prevention for Z?
Date Fri, 14 Apr 2017 13:53:26 GMT
On 2017-04-13, Allison, Timothy B. wrote:

> On TIKA-1631 [1], users have observed that a corrupt Z file can cause
> an OOM at Internal_.InternalLZWStream.initializeTable.


> Should we try to protect against this at the Tika level, or should we
> open an issue on commons-compress's JIRA?

If there is anything COMPRESS can do to detect and avoid the situation,
then please open an issue over here.

> A second question, we're creating a stream with the
> CompressorStreamFactory when all we want to do is detect.  Is there a
> recommended way to detect the type of compressor without creating a
> stream?

This has never been a goal of the *StreamFactory's but it would be
pretty easy to add "guess" or "detect" methods to them. At least for the
formats built-in into Compress.

Since 1.13 we support extensions of the factories via ServiceLoader and
LZO for Java[2] uses it for example.  Looking into the code again, we
don't support autodetection for formats added via ServiceProvider
anyway, so this is no restriction for the Tika case.

If we wanted to add such a method, what would the return value be? One
of the String constants contained inside the *Factory classes,
likely. Tika would have to be prepared for new strings popping up when
using a newer version of Compress (1.14 will add "lz4-framed" for


> [1] https://issues.apache.org/jira/browse/TIKA-1631

[2] https://github.com/shevek/lzo-java

View raw message