tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Ehrlich <jehrl...@adobe.com>
Subject RE: A plan to improve the metadata property definitions
Date Wed, 23 May 2012 17:19:28 GMT
Hi Nick,

On Tue, 22 May 2012, Joerg Ehrlich wrote:
>> Thanks, this looks like a great step forward. It definitely helps to 
>> clean up the current metadata usage. But I still have no real idea how 
>> to represent structured properties with the current Property/Metadata 
>> setup going forward.
>
>The only thing the current setup won't support is Structured Properties. 
>(That hasn't changed). That will need more work, but hopefully it'll be easier now we've
moved more things to be Property based.
>
>Are you able to come up with a good, simple example for using a structured property? That'd
provide us with something to ponder, and to use when testing out possible solutions

Sure, there are plenty of examples, like from 

EXIF, which would be easy to map onto a flat property list:
<exif:Flash rdf:parseType="Resource">
            <exif:Fired>False</exif:Fired>
            <exif:Return>0</exif:Return>
            <exif:Mode>0</exif:Mode>
            <exif:Function>False</exif:Function>
            <exif:RedEyeMode>False</exif:RedEyeMode>
</exif:Flash>

whereas Face detection would be more complicated:
<mwg-rs:Regions rdf:parseType="Resource">
      <mwg-rs:AppliedToDimensions stDim:w="4288" stDim:h="2848" stDim:unit="pixel"/>
      <mwg-rs:RegionList>
        <rdf:Bag>
          <rdf:li rdf:parseType="Resource">
            <mwg-rs:Area stArea:x="0.5" stArea:y="0.5" stArea:w="0.06" stArea:h="0.09"
stArea:unit="normalized"/>
            <mwg-rs:Type>Face</mwg-rs:Type>
            <mwg-rs:Title>John Doe</mwg-rs:Title>
          </rdf:li>
	...

Interesting are also properties which offer Language alternatives. They are arrays, but each
item is qualified with a language.
For example the title property as defined by IPTC:
<dc:title>
            <rdf:Alt>
                <rdf:li xml:lang="en-us">title</rdf:li>
	<rdf:li xml:lang="de-de">titel</rdf:li>
            </rdf:Alt>
 </dc:title>

The moment Tika would start reading more metadata from assets (like XMP) and map more than
the current simple stuff, you would have to deal with such structured information. In case
of XMP data, Tika could also just pass that through as blob data without parsing it and let
the client deal with it, of course :)

Regards
Jörg


Mime
View raw message