tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Nezda <lne...@gmail.com>
Subject Re: Tika command line performance
Date Sat, 16 Jan 2010 00:10:03 GMT
Maybe you could try Nailgun
<http://martiansoftware.com/nailgun/index.html>; if I understand
correctly, its a C socket wrapper to simple Java socket
server which holds JVM open.  I've never actually used it, but sounds like
you have a use case where it could be beneficial (assuming JVM init overhead
is slowest part).

Good luck,
- Luke

On Fri, Jan 15, 2010 at 1:07 PM, Doug Carter <dcarter@mercycorps.org> wrote:

>
> Hi all,
>
> This may be off-topic for this list, but I need to start somewhere.
>
> I need a command line utility to do document format conversion, in a
> batch mode environment. The batch process is a combination of steps, one
> of which is the actual format conversion which is currently being done
> by a collection of Linux binary converters like wvWare, pdftohtml, etc.
>
> I've put a shell script wrapper around the tika jar:
>
>  java -jar tika-app.jar [infile] > [outfile]
>
> This works OK, but as you would imagine, it is much slower compared to
> a Linux binary.
>
> Does anyone know of a way to improve the performance in a setup like
> this? I know it goes against the whole philosophy of Java, but is there
> a way to compile the Tika jar byte code into a native Linux binary? I've
> taken a look at gcj, but it doesn't look like a simple re-compile.
>
> Any ideas would be greatly appreciated.
>
> TIA,
>
> Doug
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message