tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Convert file before Tika processes it?
Date Thu, 21 Jun 2012 13:17:15 GMT
+1, great solution, Jukka!

Cheers,
Chris

On Jun 21, 2012, at 8:08 AM, Jukka Zitting wrote:

> Hi,
> 
> On Thu, Jun 21, 2012 at 4:35 AM, 122jxgcn <ywpark90@gmail.com> wrote:
>> Hi, I'm currently working on Tika to properly process custom file type (*.hwp
>> file) I have a binary executable file which converts hwp file into xml file.
>> I'm not sure how can I include this binary file so that when Tika encounters
>> hwp file, it can automatically convert in to xml file using the binary, and
>> pass the document to XMLParser.
> 
> The best approach would be for you to write a custom Parser class for
> this file type. That class would call your executable to convert the
> file to XML and would then invoke the standard XMLParser on the
> result.
> 
> BR,
> 
> Jukka Zitting


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message