tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Don (Jira)" <j...@apache.org>
Subject [jira] [Created] (TIKA-3017) OOM in XSLFSheet.java
Date Wed, 18 Dec 2019 12:32:00 GMT
Don created TIKA-3017:
-------------------------

             Summary: OOM in XSLFSheet.java
                 Key: TIKA-3017
                 URL: https://issues.apache.org/jira/browse/TIKA-3017
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.19
            Reporter: Don
         Attachments: OOM_Slide_18.pptx

When tiki parses the attached power point slide it OOMs every time. The slide is a scrubbed
slide from a Microsoft PowerPoint deck. Unfortunately I have no idea how the slide was created.
When you open the slide it will look like it is a totally blank slide, however if you perform
a select all on the slide while it is open in PowerPoint you will see there are two items
contained in the slide, one inside the other. The person that created the slide deck is not
longer available to give details as to how the slide was created. The two items in the slide
deck appear to be text boxes, but I am not sure this is the case because if either one is
removed and replace with a textbox using MS PowerPoint the OOM does not happen anymore. Also,
if the slide is open in LibreOffice and then saved, the OOM does not happen. There seems to
be something specific about whatever these items really are and how they were created.

The following is the stack trace of the OOM when it is parsed by tikia:

{noformat}

Executor task launch worker for task 47360
 at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
 at java.util.Arrays.copyOf([JI)[J (Arrays.java:3308)
 at java.util.BitSet.ensureCapacity(I)V (BitSet.java:337)
 at java.util.BitSet.expandTo(I)V (BitSet.java:352)
 at java.util.BitSet.set(I)V (BitSet.java:447)
 at org.apache.poi.xslf.usermodel.XSLFSheet.registerShapeId(I)V (XSLFSheet.java:123)
 at org.apache.poi.xslf.usermodel.XSLFDrawing.<init>(Lorg/apache/poi/xslf/usermodel/XSLFSheet;Lorg/openxmlformats/schemas/presentationml/x2006/main/CTGroupShape;)V
(XSLFDrawing.java:47)
 at org.apache.poi.xslf.usermodel.XSLFSheet.initDrawingAndShapes()V (XSLFSheet.java:214)
 at org.apache.poi.xslf.usermodel.XSLFSheet.getShapes()Ljava/util/List; (XSLFSheet.java:201)
 at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(Lorg/apache/tika/sax/XHTMLContentHandler;)V
(XSLFPowerPointExtractorDecorator.java:110)
 at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(AbstractOOXMLExtractor.java:136)
 at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(OOXMLExtractorFactory.java:156)
 at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(OOXMLParser.java:110)
 at org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(CompositeParser.java:280)
 at org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(AutoDetectParser.java:143)
 at

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message