tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2347) Underlined text is not decorated as such when extracting from word documents
Date Sat, 16 Sep 2017 23:20:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169132#comment-16169132
] 

ASF GitHub Bot commented on TIKA-2347:
--------------------------------------

darkdreamingdan commented on issue #173: Fix for TIKA-2347 Adds underline extraction from
word documents
URL: https://github.com/apache/tika/pull/173#issuecomment-330000650
 
 
   Could you also add strikethrough support?  It's just the same thnig but using the `<strike>`
xhtml element.  We have our own branch for this code but it would be good to unify our PRs.
   
   Also, any news on this getting merged?  
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Underlined text is not decorated as such when extracting from word documents
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-2347
>                 URL: https://issues.apache.org/jira/browse/TIKA-2347
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.0, 1.14
>            Reporter: Stuart Hendren
>
> When extracting from doc and docx bold and italic text decoration is extracted, however
underlining is not.  Can be demonstrated in WordParserTest or OOXMLParserTest (change to docx)
with the following test case.
> {code:title=WordParserTest.java|borderStyle=solid}
>     @Test
>     public void testTextDecoration() throws Exception {
>       XMLResult result = getXML("testWORD_various.doc");
>       String xml = result.xml;
>       assertTrue(xml.contains("<b>Bold</b>"));
>       assertTrue(xml.contains("<i>italic</i>"));
>       assertTrue(xml.contains("<u>underline</u>"));
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message