tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Borja Serrano (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2904) Error parsing a Word document with a WMF image
Date Mon, 15 Jul 2019 14:20:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885275#comment-16885275
] 

Borja Serrano commented on TIKA-2904:
-------------------------------------

Thanks for the update, Tim. I will stay on 4.0.1 for the time being. I found your conversation
last week :). Do you want me to close the ticket now that I am sure you are aware of the issue?

> Error parsing a Word document with a WMF image
> ----------------------------------------------
>
>                 Key: TIKA-2904
>                 URL: https://issues.apache.org/jira/browse/TIKA-2904
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.21
>            Reporter: Borja Serrano
>            Priority: Major
>
> If you try to parse a document with a WMF file and you are importing the newest version
of Apache POI (4.1.0 which is marked as compatible) you get a NoSuchMethodError exception:
> {code:java}
> 2019-07-11 11:06:59 com.penman.web.configuration.CustomAsyncExceptionHandler [ERROR]
Exception in async task message - org.apache.poi.hwmf.record.HwmfRecord.getRecordType()Lorg/apache/poi/hwmf/record/HwmfRecordType;
> java.lang.NoSuchMethodError: org.apache.poi.hwmf.record.HwmfRecord.getRecordType()Lorg/apache/poi/hwmf/record/HwmfRecordType;
> at org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:72) ~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104)
~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:391)
~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart(AbstractOOXMLExtractor.java:264)
~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:206)
~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:139)
~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:201)
~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110) ~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.21.jar:1.21]
> {code}
> The problem comes from an update in Apache POI. Since 4.1.0 the function getRecordType
is no longer usable and we need to use getWmfRecordType (there was a discussion about the
change in [http://apache-poi.1045710.n5.nabble.com/VOTE-Apache-POI-4-1-0-release-RC3-td5733174.html])



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message