tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1181) RTFParser not keeping HTML font colors and underscore tags.
Date Mon, 07 Oct 2013 03:59:42 GMT
Leo created TIKA-1181:
-------------------------

             Summary: RTFParser not keeping HTML font colors and underscore tags.
                 Key: TIKA-1181
                 URL: https://issues.apache.org/jira/browse/TIKA-1181
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
         Environment: Windows server 2008
            Reporter: Leo


Hi,

I'm having problems with this code. It does not put the font colors and underscores "<u></u>"
tags in the HTML from the RTF string. Is there anything I can do to put them there? 

Code:
InputStream in = new ByteArrayInputStream(rtfString.getBytes("UTF-8"));  
		   
org.apache.tika.parser.rtf.RTFParser parser = new org.apache.tika.parser.rtf.RTFParser();
		   		   
Metadata metadata = new Metadata();

StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
		             SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
		    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
		    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
handler.setResult(new StreamResult(sw));

parser.parse(in, handler, metadata, new ParseContext());

String xhtml = sw.toString();
		    
xhtml = xhtml.replaceAll("\r\n", "<br>\r\n");

Thanks for looking at it.
Leo



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message