tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Metadata use by Apache Java projects
Date Mon, 19 Nov 2007 16:54:01 GMT

[Responding just on tika-dev@. I guess Jeremias follows all these
forums, and can summarize in the end...]

On Nov 19, 2007 11:26 AM, Jeremias Maerki <dev@jeremias-maerki.ch> wrote:
> Every one of these projects has its own means to represent metadata in
> memory. Wouldn't it make sense to have a common approach?


> Sanselan and Tika have both chosen a very simple approach but is it
> versatile enough for the future? While the simple Map<String, String[]> in
> Tika allows for multiple authors, for example, it doesn't support
> language alternatives for things such as dc:title or dc:description.

IMHO it would be good to have a more flexible metadata model in Tika.
Better yet if it's a standard used across multiple projects. Best if
we don't need to implement it in Tika. :-)

> My questions:
> - Any interest in converging on a unified model/approach?


> - If yes, where shall we develop this? As part of Tika (although it's
> still in incubation)? As a seperate project (maybe as Apache Commons
> subproject)? If more than XML Graphics uses this, XML Graphics is
> probably not the right home.
> - Is Adobe's XMP toolkit interesting for adoption (!=incubation)? Is
> the JempBox or XML Graphics Commons approach more interesting?

If there already exists acceptably licensed good code outside the ASF,
then I would prefer using that instead of reinventing the wheel within
the foundation.


Jukka Zitting

View raw message