tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Desruisseaux <martin.desruisse...@geomatys.com>
Subject Re: ISO 19115 as a metadata model for Tika?
Date Wed, 04 Nov 2015 19:33:39 GMT
Hello Chris

Le 03/11/15 19:02, Mattmann, Chris A (3980) a écrit :
> I think having some specific patches of how this would look
> would help to take it less away from the abstract and more
> into the concrete area. I encourage you to try it out MartinD,
> and see if there is a good overlap there.

I attached to TIKA-443 a demo extracting some
org.apache.tika.metadata.DublinCore properties from an
org.opengis.metadata.Metadata object. This is not a patch that can be
included in Tika however since I do not know how to integrate those
properties in Tika (I would let this work to volunteers).

This demo tries to give some tips about only one aspect of the
discussion: adding an ISO 19115 parser in Tika. There is an other aspect
of the discussion which is not covered by this demo: whether the Tika
metadata model should be extended to support the richness of more
complex models like ISO 19115.

More specifically, if one look at the demo, we can see that there is
many loops. "Identification" object can contain many "Citation", which
in turn can contain many "ResponsibleParty", etc. For this demo I just
mapped e.g. the title of the first "Identification" instance to the
DublinCore's "title" property, then break the loop. Obviously
information are lost, so the question is whether it is a goal for Tika
to capture those information, or if they are considered too specific.

If Tika chooses to capture such information, then a tree structure will
become necessary. So a next question would be how to do that, if a "tree
structure" and a "flat structure" should cohabit, etc. But we do not
need to answer those questions now (a simple ISO 19115 parser mapping to
the current Dublin Core properties could be done).

    Martin



Mime
View raw message