tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Palsulich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1318) Use of Deprecated Word6Extractor.getParagraphText() Method
Date Tue, 03 Jun 2014 22:23:02 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017218#comment-14017218
] 

Tyler Palsulich commented on TIKA-1318:
---------------------------------------

How does WordToHtmlConverter work with HWPFOldDocument?

> Use of Deprecated Word6Extractor.getParagraphText() Method
> ----------------------------------------------------------
>
>                 Key: TIKA-1318
>                 URL: https://issues.apache.org/jira/browse/TIKA-1318
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Tyler Palsulich
>            Priority: Minor
>              Labels: deprecation
>             Fix For: 1.6
>
>
> org.apache.tika.parser.microsoft.WordExtractor.parseWord6() uses the deprecated Word6Extractor.getParagraphText()
method. getParagraphText() is supposed to return a String[] with an element for each paragraph
in the text. The replacement is getText(), which lets paragraph, cell, etc separation be implementation
specific. I'm not sure, at this point, how the POI WordExtractor separates them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message