tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Beeker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available
Date Tue, 29 Sep 2015 20:54:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935840#comment-14935840
] 

Andreas Beeker commented on TIKA-1748:
--------------------------------------

at HSLFExtractor:
- as the extraction is specific to HSLF, I would use the HSLF and not the common sl classes.
- in my patch (tika-1707) the method textRunsToText wraps paragraphs in divs ... I don't know
how the post-processing of the output works, but I wanted to point out, that you don't deal
with just a list of textruns, but with separate paragraphs having textruns

at PowerPointParserTest:
- the above handling leads to a change in the tests

at XSLFPowerPointExtractorDecorator:
- use XSLF classes instead of common sl
- the new extractTable method is better

Andi.


> Upgrade to POI 3.13-final when available
> ----------------------------------------
>
>                 Key: TIKA-1748
>                 URL: https://issues.apache.org/jira/browse/TIKA-1748
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: TIKA-1748.patch
>
>
> Upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message