tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject RE: [metadata] Input on reorganization of Metadata interfaces
Date Fri, 04 May 2012 21:34:26 GMT
On Fri, 4 May 2012, Joerg Ehrlich wrote:
>>> The keys will always link to properties of other namespace interfaces like:
>>> String Title = DublinCore.Title.getName(); String Author =
>>> DublinCore.Creator.getName();
>
>> Won't that break existing parsers and consumers though? As Title will 
>> suddenly change from being "title" to "dc:title", won't it?
>
> If they are not using the Tika constants themselves but their values 
> instead, then yes.

That'll break things like Alfresco then. (We do the mapping from Tika 
metadata to Alfresco metadata on the strings, rather than by Metadata 
constants, so it's more flexible and easier for users to extend). I 
suspect Alfresco isn't the only consumer of Tika's metadata that does the 
same thing. Anything that uses tika-cli will likewise be string based, not 
Metadata Constant based

> Thinking about it, I am actually not sure whether we really need to have 
> the prefixes in the names anymore if the new keys are properties instead 
> of strings. Then we could implement other means to identify the 
> namespace for a property, by storing it in the property for example :)

I think the current ones that have a prefix are easier and cleaner to 
understand than the un-prefixed ones. If we're going to be basing the keys 
explicitly on a standard, I think we ought to make that explicit wherever 
we can, including in the key names. It will be a faff for people to change 
over, and for us to handle in the mean time, but I think if we're going to 
be making a change of this scale we should take the chance to do it all 
properly

Nick

Mime
View raw message