tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Bonniot de Ruisselet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-946) Improve how the PPTX parser uses XLSF from POI
Date Tue, 11 Sep 2012 12:44:08 GMT

    [ https://issues.apache.org/jira/browse/TIKA-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452973#comment-13452973

Daniel Bonniot de Ruisselet commented on TIKA-946:

Does it also belong to this task that the output would represent the structures of slides
(one <div> element per slide)?
> Improve how the PPTX parser uses XLSF from POI
> ----------------------------------------------
>                 Key: TIKA-946
>                 URL: https://issues.apache.org/jira/browse/TIKA-946
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>            Reporter: Nick Burch
> One last bit from TIKA-757 and TIKA-805 - the current way that PPTX files are parsed
using XSLF from Apache POI has a couple of last remaining low level parts.
> We should avoid the need to go from the usermodel XMLSlideShow to the low level XSLFSlideShow
to do the text extraction (occurs in XSLFPowerPointExtractorDecorator).
> We should also update the usermodel slide support to extract out the slide names from
docProps/app.xml, so that these can be included in the text output easily (in XSLFPowerPointExtractor)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message