tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: external parsers
Date Wed, 13 Jun 2007 13:38:20 GMT

On 6/13/07, Philipp Koch <philipp.koch@gmail.com> wrote:
> i am currently also doing meta data extraction from various file
> formats and got also attracted by the introduction of the tika
> project. i found a very interesting image meta data extractor library
> which is shipped under apache license but the project itself is not
> hosted at apache (see http://www.fightingquaker.com/sanselan/).

Looks nice!

> would it make sense to ask the project owner(s) of such projects to
> move to the apache project, to also make sure that such useful libs
> will be maintained and development will continue?

It's up to the external project community to decide if they want to
become an Apache project. We can of course mention the Incubator and
offer to help if they want to bring the project to Apache, but I
wouldn't want to go on a crusade to turn all our dependencies into
Apache projects.

I think the prime criteria on selecting which external libraries to
use as default parsers in Tika (a plugin interface should of course
allow any other libraries to be used instead of the defaults if
needed) would be code quality, licensing, and active maintenance. All
of these are typically well handled by Apache projects, but there's no
inherent rule that external projects couldn't achieve these criteria
just as well or even better than Apache projects.

So, once we have our act together (a working codebase and an
architectural roadmap) I think we should start contacting various
parser projects for cooperation. We should explain what we are trying
to do and preferably have for each parser library we depend on someone
who is following the mailing lists for both Tika and the parser
library in question. While building those bridges we could also
mention the chance of bringing external projects into Apache, but that
definitely shouldn't be a precondition on cooperation.

> ps: don't know if this is the right place for such questions....

Good as any. :-)


Jukka Zitting

View raw message