tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Carter <dcar...@mercycorps.org>
Subject Tika command line performance
Date Fri, 15 Jan 2010 19:07:05 GMT

Hi all,

This may be off-topic for this list, but I need to start somewhere.

I need a command line utility to do document format conversion, in a
batch mode environment. The batch process is a combination of steps, one
of which is the actual format conversion which is currently being done
by a collection of Linux binary converters like wvWare, pdftohtml, etc.

I've put a shell script wrapper around the tika jar:

  java -jar tika-app.jar [infile] > [outfile]

This works OK, but as you would imagine, it is much slower compared to
a Linux binary. 

Does anyone know of a way to improve the performance in a setup like
this? I know it goes against the whole philosophy of Java, but is there
a way to compile the Tika jar byte code into a native Linux binary? I've
taken a look at gcj, but it doesn't look like a simple re-compile.

Any ideas would be greatly appreciated.



View raw message