tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1945) Powerpoint parser doesn't extract text from diagrams
Date Sun, 10 Apr 2016 00:43:25 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233797#comment-15233797
] 

Nick C commented on TIKA-1945:
------------------------------

Also while looking in to the code I noticed AbstractOOXMLExtractor.getXHTML passes the content
handler to handleEmbeddedParts and handleThumbnail instead of the XHTMLContentHandler that
is passed to buildXHTML If that's a bug I can create another jira ticket

> Powerpoint parser doesn't extract text from diagrams
> ----------------------------------------------------
>
>                 Key: TIKA-1945
>                 URL: https://issues.apache.org/jira/browse/TIKA-1945
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.12
>            Reporter: Nick C
>         Attachments: Diagram.pptx
>
>
> Attached is an example org chart that Tika doesn't extract text from



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message