tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niall Pemberton" <niall.pember...@gmail.com>
Subject How is WriteOutContentHandler supposed to work?
Date Tue, 20 Nov 2007 07:14:36 GMT
Apologies if this is a stupid question, but I don't understand
WriteOutContentHandler[1] - shouldn't it be implementing the
startElement(), endElement() etc. methods?

For example, ExcelParserTest[2] outputs the following for testEXCEL.xls:

Simple Excel documentSample Excel Worksheet - Numbers and their
Squares Number Square 1.0 1.0 2.0 4.0 3.0 9.0 4.0 16.0 5.0 25.0 6.0
36.0 7.0 49.0 8.0 64.0 9.0 81.0 10.0 100.0 11.0 121.0 12.0 144.0 13.0
169.0 14.0 196.0 15.0 225.0 Written and saved in Microsoft Excel X for
Mac Service Release 1.

..but I would have thought it should be something like

<html>
<head>
<title>Simple Excel document</title>
</head>
<body>
<p>Sample Excel Worksheet - Numbers and their Squares Number Square
1.0 1.0 2.0 4.0 3.0 9.0 4.0 16.0 5.0 25.0 6.0 36.0 7.0 49.0 8.0 64.0
9.0 81.0 10.0 100.0 11.0 121.0 12.0 144.0 13.0 169.0 14.0 196.0 15.0
225.0 Written and saved in Microsoft Excel X for Mac Service Release
1.</p>
</body>
</html>

Niall

[1] http://incubator.apache.org/tika/xref/org/apache/tika/sax/WriteOutContentHandler.html
[2] http://incubator.apache.org/tika/xref-test/org/apache/tika/parser/microsoft/ExcelParserTest.html

Mime
View raw message