tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1181) RTFParser not keeping HTML font colors and underscore tags.
Date Mon, 07 Oct 2013 14:17:42 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788171#comment-13788171
] 

Uwe Schindler commented on TIKA-1181:
-------------------------------------

Other parsers like OpenOffice do not preserve colors, too.

> RTFParser not keeping HTML font colors and underscore tags.
> -----------------------------------------------------------
>
>                 Key: TIKA-1181
>                 URL: https://issues.apache.org/jira/browse/TIKA-1181
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>         Environment: Windows server 2008
>            Reporter: Leo
>              Labels: RTFParser
>
> Hi,
> I'm having problems with this code. It does not put the font colors and underscores "<u></u>"
tags in the HTML from the RTF string. Is there anything I can do to put them there? 
> Code:
> InputStream in = new ByteArrayInputStream(rtfString.getBytes("UTF-8"));  
> 		   
> org.apache.tika.parser.rtf.RTFParser parser = new org.apache.tika.parser.rtf.RTFParser();
> 		   		   
> Metadata metadata = new Metadata();
> StringWriter sw = new StringWriter();
> SAXTransformerFactory factory = (SAXTransformerFactory)
> 		             SAXTransformerFactory.newInstance();
> TransformerHandler handler = factory.newTransformerHandler();
> 		    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
> 		    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
> handler.setResult(new StreamResult(sw));
> parser.parse(in, handler, metadata, new ParseContext());
> String xhtml = sw.toString();
> 		    
> xhtml = xhtml.replaceAll("\r\n", "<br>\r\n");
> Thanks for looking at it.
> Leo



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message