tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Re: Convert file before Tika processes it?
Date Thu, 21 Jun 2012 17:07:30 GMT
On Wed, 20 Jun 2012, 122jxgcn wrote:
> Hi, I'm currently working on Tika to properly process custom file type 
> (*.hwp file) I have a binary executable file which converts hwp file 
> into xml file. I'm not sure how can I include this binary file so that 
> when Tika encounters hwp file, it can automatically convert in to xml 
> file using the binary, and pass the document to XMLParser. Any 
> suggestions?

I'd suggest you do a custom parser for your file format, which first calls 
out to your custom program, then feeds the result directly to Tika's 
XMLParser.

The website has a good guide on writing your own custom parsers:
    http://tika.apache.org/1.1/parser_guide.html

Nick

Mime
View raw message