tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Re: How to Convert Doc or Docx File to HTML?
Date Sun, 29 Jan 2012 13:42:38 GMT
On Sun, 29 Jan 2012, abc wrote:
> I need to convert doc/docx into html. I was able to convert doc into html
> using Apache poi. But I am unable to convert docx to html. Some suggest me
> to use XWPFWordExtractorDecorator class which convert docx to html. I was
> able to reuse XWPFWordExtractorDecorator class.

The answer remains the same as you were given when you asked this on the 
POI dev list, and when you asked this on StackOverflow. You really don't 
need to ask the identical question a third time elsewhere, when you've 
already been given the answer twice!

As a reminder, you need to pass in a suitable ContentHandler, and you need 
to call the parser rather than internal classes:
http://mail-archives.apache.org/mod_mbox/poi-dev/201201.mbox/%3Calpine.DEB.2.00.1201291240390.18298%40urchin.earth.li%3E

Nick

Mime
View raw message