lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hanjan, Harinder" <Harinder.Han...@calgary.ca>
Subject Getting "zip bomb" exception while sending HTML document to solr
Date Thu, 05 Apr 2018 20:10:56 GMT
Hello!

I'm sending a HTML document to Solr and Tika is throwing the "Zip bomb detected!" exception
back. Looks like Tika has an arbitrary limit of 100 level of XML element nesting (https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-core/src/main/java/org/apache/tika/sax/SecureContentHandler.java#L72-L75).
 Luckily, the variable (maxDepth) does have a public setter function but I am not sure if
it's possible to set this at Solr.  Is it possible? If so, how would I set the value of maxDepth
to a higher number?

Thanks!

Here is the full stack trace:
2018-04-05 16:47:48.034 ERROR (qtp1654589030-15) [   x:aconn] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Zip bomb detected!
                at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
                at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
                at ca.calgary.csc.wds.solr.GsaAconnRequestHandler.handleRequestBody(GsaAconnRequestHandler.java:84)
                at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
                at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
                at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
                at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
                at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
                at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
                at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
                at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
                at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
                at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
                at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
                at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
                at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
                at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
                at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
                at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
                at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
                at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
                at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
                at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
                at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
                at org.eclipse.jetty.server.Server.handle(Server.java:534)
                at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
                at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
                at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
                at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
                at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
                at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
                at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
                at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
                at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
                at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
                at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.tika.exception.TikaException: Zip bomb detected!
                at org.apache.tika.sax.SecureContentHandler.throwIfCauseOf(SecureContentHandler.java:192)
                at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:138)
                at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
                ... 35 more
Caused by: org.apache.tika.sax.SecureContentHandler$SecureSAXException: Suspected zip bomb:
100 levels of XML element nesting
                at org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:234)
                at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
                at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
                at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
                at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
                at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:255)
                at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:297)
                at org.apache.tika.parser.html.HtmlHandler.startElementWithSafeAttributes(HtmlHandler.java:251)
                at org.apache.tika.parser.html.HtmlHandler.startElement(HtmlHandler.java:167)
                at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
                at org.apache.tika.parser.html.XHTMLDowngradeHandler.startElement(XHTMLDowngradeHandler.java:60)
                at org.ccil.cowan.tagsoup.Parser.push(Parser.java:794)
                at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1061)
                at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1016)
                at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:625)
                at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
                at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:135)
                at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
                at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
                at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
                ... 36 more



________________________________
NOTICE -
This communication is intended ONLY for the use of the person or entity named above and may
contain information that is confidential or legally privileged. If you are not the intended
recipient named above or a person responsible for delivering messages or communications to
the intended recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying of
this communication or any of the information contained in it is strictly prohibited. If you
have received this communication in error, please notify us immediately by telephone and then
destroy or delete this communication, or return it to us by mail if requested by us. The City
of Calgary thanks you for your attention and co-operation.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message