lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Zip Bomb Exception in HTML File
Date Wed, 04 Jan 2017 19:23:53 GMT
This came up back in September [1] and [2].  Same trigger...crazy number of divs.  

I think we could modify the AutoDetectParser to enable configuration of maximum zip-bomb depth
via tika-config.

If there's any interest in this, re-open TIKA-2091, and I'll take a look.

Best,

            Tim

[1] http://git.net/ml/solr-user.lucene.apache.org/2016-09/msg00561.html
[2] https://issues.apache.org/jira/browse/TIKA-2091

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, January 4, 2017 12:20 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Zip Bomb Exception in HTML File

You might get a more knowledgeable response from the Tika folks, that's really not something
Solr controls.


Best,
Erick

On Wed, Jan 4, 2017 at 8:50 AM,  <sn00py@ulysses-erp.com> wrote:
> i get an exception "<strname="msg">org.apache.tika.exception.TikaException:
> Zip bomb detected!</str"
>
> if i would like to parse a html file - and i think i know why.
> because there are many many <div><span> in cascade over 200 divs and 
> span are inside each.
>
> Is it correct that there is this limit for html files?
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>

Mime
View raw message