tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bertrand Delacretaz" <bdelacre...@apache.org>
Subject Re: Questions
Date Sat, 30 Jun 2007 09:19:35 GMT
On 6/30/07, Grant Ingersoll <gsingers@apache.org> wrote:

> ...My main concern w/ extracting Nutch is all the dependencies on
> Hadoop, etc.  But it does seem like the shortest path for me....

I've mentioned Tika to a few colleagues lately, and one thing that
comes up often is that there are many document/format parsing
libraries around, which should ideally be usable as Tika plugins with
as little changes as possible.

But these libraries' dependencies are all around the place, and
probably conflicting in many cases.

It might be good to take that into account in the design of Tika, and
use solid classloading and isolation mechanisms. OSGI comes to mind,
assuming it doesn't bloat the whole thing.


View raw message