tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Using standard XMP schemas for image and audio metadata
Date Mon, 09 Feb 2009 17:41:20 GMT

On Mon, Feb 9, 2009 at 3:11 PM, Jonathan Koren <jonathan@soe.ucsc.edu> wrote:
> On Feb 8, 2009, at 10:59 AM, Jukka Zitting wrote:
>> Note that I'm only proposing that we change the keys of the six
>> metadata entries I listed.
> But why only those six?

Because they are useful pieces of metadata that are already accurately
defined in the respective XMP schemas. I for example didn't propose
changing the MIDI metadata key "patches", as AFAIK there is no
standard schema that covers that piece of information.

> You're not proposing to support all of XMP, just the bare minimum that you
> need this week.  At some point you're going to want to add more metadata
> and then you're going going to have to deal with the ontology mismatch problem.

I'm not proposing that we try to map all the metadata we support into
the XMP schemas. All I'm trying to do is avoid using custom keys for
information where a well defined and widely used standard alternative
already exists.

If there's an ontology mismatch, then we can use custom keys. But I
don't see why we should invent new keys when standard alternatives
with the exact same semantics already exist.

A Tika-specific client shouldn't care whether the metadata key is
"width", "tiff:ImageWidth", "xyzzy" or even "the return value of
javax.imageio.ImageReader.getWidth(0)"; it should just use a constant
like Metadata.IMAGE_WIDTH.

The metadata key "tiff:ImageWidth" is well documented and makes life
easier when your application needs to interact with existing XMP
infrastructure (or other metadata tools that already know how to
import XMP metadata), and I don't see why the key would be any worse
than the alternatives.

> You create a new class that takes the raw key-value pairs that stored in
> Tika::Metadata and translates them to something else. Call it Metadata2XMP
> or whatever.  That can be packaged within Tika as a convenient class
> that does least common denominator mapping in a well defined way.

Having such a mapping class within Tika is an alternative, but as
discussed in the Dublin Core thread [1] in December, I'm not sure if
it's worth the added complexity. My proposal covers the use case with
much less extra code or documentation.

[1] http://markmail.org/message/zjsjslaelx6acf6z


Jukka Zitting

View raw message