tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Ehrlich <jehrl...@adobe.com>
Subject RE: [metadata] Input on reorganization of Metadata interfaces
Date Tue, 08 May 2012 12:35:55 GMT

-----Original Message-----
From: Nick Burch [mailto:nick.burch@alfresco.com] 
Sent: Freitag, 4. Mai 2012 23:34
To: dev@tika.apache.org
Subject: RE: [metadata] Input on reorganization of Metadata interfaces

On Fri, 4 May 2012, Joerg Ehrlich wrote:
>>>> The keys will always link to properties of other namespace interfaces like:
>>>> String Title = DublinCore.Title.getName(); String Author = 
>>>> DublinCore.Creator.getName();
>>> Won't that break existing parsers and consumers though? As Title will 
>>> suddenly change from being "title" to "dc:title", won't it?
>> If they are not using the Tika constants themselves but their values 
>> instead, then yes.
>That'll break things like Alfresco then. (We do the mapping from Tika metadata to Alfresco
metadata on the strings, rather than by Metadata constants, so it's more flexible and easier
for users to extend). I suspect >Alfresco isn't the only consumer of Tika's metadata that
does the same thing. Anything that uses tika-cli will likewise be string based, not Metadata
Constant based
>> Thinking about it, I am actually not sure whether we really need to 
>> have the prefixes in the names anymore if the new keys are properties 
>> instead of strings. Then we could implement other means to identify 
>> the namespace for a property, by storing it in the property for 
>> example :)
>I think the current ones that have a prefix are easier and cleaner to understand than
the un-prefixed ones. If we're going to be basing the keys explicitly on a standard, I think
we ought to make that explicit wherever we >can, including in the key names. It will be
a faff for people to change over, and for us to handle in the mean time, but I think if we're
going to be making a change of this scale we should take the chance to do it all properly

I am not sure whether it is the proper way to put prefixes into the strings. As you said above
the clients depend on those strings and it is actually not a good thing to depend on namespace
prefixes instead of the actual namespaces, because prefixes are just variables everyone can
choose as one likes.
The only reason I did not touch the prefix concept was that I just didn't want to change yet
another part of Tika :) But if you ask me, I would try to keep the prefixes out of the names.

What would be your proposal then how to handle this transition may it be with prefixes or


View raw message