tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoni Mylka <antoni.my...@gmail.com>
Subject Re: [metadata] roadmap proposal available on the wiki
Date Thu, 26 Apr 2012 21:23:30 GMT
2012/04/25 Joerg Ehrlich napisał/wrote:
> Hi,
>
> I have put a proposal of a roadmap for the metadata features in Tika on the wiki:
> http://wiki.apache.org/tika/MetadataRoadmap
>
> The proposal is based on a discussion around this topic I have had with Jukka.
> Please review and feel free to edit the wiki for the discussion. I will also update the
wiki according to the discussion.
>
> BTW, how do I attach an image on that wiki? The documentation mentions the "attachment"
link, which I am not able to find.

My 2c.

The proposal is great. At last, after five years a way to squeeze some 
sort of semantics into Tika metadata, that actually looks doable without 
having to rewrite the library from scratch.

The roadmap seems clear on the todos required from the coding POV. The 
XMP data model, while more limited than full RDF will likely be enough. 
The roadmap doesn't give much detail about the intended vocabularies. 
Dublin core is great, but what else? Joerg? What other kinds of metadata 
information would you like to extract with Tika, and what vocabularies 
would you like to use to express them?

At Adobe, you'll likely want Tika to transparently get the XMP metadata 
from the docs (using whatever vocabularies you use to express whatever 
info you need) into your metadata-processing software, that already 
"understands" the semantics of those XMP properties and values. What 
data would you like to have Tika transform to common vocabularies and 
what vocabularies will that be?

Antoni Myłka
antoni.mylka@gmail.com

Mime
View raw message