tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-860) Make ZIP bomb detection configureable
Date Sat, 30 Jun 2012 16:23:43 GMT

    [ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404538#comment-13404538
] 

Uwe Schindler commented on TIKA-860:
------------------------------------

OK!
                
> Make ZIP bomb detection configureable
> -------------------------------------
>
>                 Key: TIKA-860
>                 URL: https://issues.apache.org/jira/browse/TIKA-860
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Uwe Schindler
>
> The detection of ZIP bombs is nice and the original issue says it's configureable, but
I found no solution how to change ParseContext of the AutoDetectParser to e.g. allow deeper
nesting levels. The SecureContentHandler instantiation is hardcoded and there is no point
of intervention.
> In my case a simple ZIP of an Eclipse project: http://store.pangaea.de/Publications/AltaweelM_2011/Salinization.zip
triggered the bomb detection, but it is of course no bomb. Its just because the JAR/WAR files
in this projects itself contain other JAR files and class files :-) This overflows the nesting
level of 10 - maybe even the TIKA OSGI bundle triggers the bomb detection (not tested).
> In my case I would like to raise the nesting level, but there is no solution. My change
was to simply filter away JAR files (as they contain no metadata we are interested in our
own development, we already removed e.g. CLASS file parsers from out TIKA config so we have
a very simple parser structure only allowing pdf, office documents, txt files,...) by using
a custom DocumentSelector in my ParseContext.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message