tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Multilingual Tika
Date Sat, 05 Nov 2011 00:22:21 GMT

With Tika 1.0 almost done (how cool is that!), I think it's time to
start looking forward to what we'll be doing during the 1.x cycle. One
thing I've had in mind for a long time is to make Tika more easily
usable in programming languages other than Java.

The tika-app jar already helps with that and I know there are people
using Tika in .NET with IKVM, but it would be nice to see more tighter
Tika integration also to languages like Python, Ruby, Javascript, Perl
and PHP. Could we for example make a Ruby Gem out of Tika?

The Tika facade class provides a pretty nice set of basic
functionality that should be reasonably easy to port to other
languages. More advanced Tika constructs like the SAX event mechanism
or things like the ParseContext are probably trickier to port, so as a
first step I'd be interested in looking at simply providing a basic
set of Tika.py, Tika.rb, Tika.js, Tika.pm and Tika.php bindings (plus
whatever else people may be interested in) that just reflect the key
functionality found in Tika.java.

Anyone interested in joining such an effort? Any pointers to existing
work along similar lines?


Jukka Zitting

View raw message