tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-1840) No way to link slide notes to slide in PPT output.
Date Sun, 21 May 2017 15:40:10 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris A. Mattmann updated TIKA-1840:
------------------------------------
    Fix Version/s:     (was: 1.15)
                   1.16

> No way to link slide notes to slide in PPT output.
> --------------------------------------------------
>
>                 Key: TIKA-1840
>                 URL: https://issues.apache.org/jira/browse/TIKA-1840
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.11
>            Reporter: Sam H
>            Assignee: Chris A. Mattmann
>             Fix For: 1.16
>
>
> I'm integrating Apache Tika into my project, and I want to extract (text) information
from Powerpoint slides. Both PPT and PPTX
> I've noticed when using PPT format, the slide notes are all aggregated at the end of
the XML output, and there is no way to identify which note belongs to which slide.
> I began looking at the code and found the following:
> {code}
> // TODO Find the Notes for this slide and extract inline
> {code}
> in [HSLFExtractor.java|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java]
on line 140 
> I would like to implement this part and contribute



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message