tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@jpl.nasa.gov>
Subject Re: Metadata use by Apache Java projects
Date Tue, 20 Nov 2007 17:22:29 GMT
Hi Antoni,

> Chris Mattman has written that it's
necessary to
> strike a balance between functionality and over-bloating.
 From my own
> experience i can say that it is VERY difficult :).

Well from my own experience I can tell you that it *is* difficult, but
certainly doable.

I've been working with different forms of metadata (Dublin Core, ISO 11179,
RDF, OWL/etc.), been involved in international standards organizations
(CCSDS, ISO) who are developing metadata standards, and worked on several
projects that deal with metadata (Object Oriented Data Technology [OODT],
Semantic Web for Earth and Environmental Terminology [SWEET]) in different
domains (earth science, planetary science, space science, cancer
research/etc.) for almost 7 years now.

Sure, there are a lot of standards and people can talk about coming up with
a one-size-fits-all cookie cutter type library for these capabilities,
however, I think it's important to understand that developing such libraries
(rather than striking the balance) in my mind is the most difficult problem
to tackle. I think that in the end, all we can do as software developers, as
people who are trying to standardize metadata, is to try and develop core
libraries and functions that others can build upon for their own needs. I
don't think the Tika folks should be in the business of trying to develop
high capability metadata libraries, because in the end, just as everyone is
saying, those need to be tailored to a specific use-case or domain. On the
other hand, I think it's a much-more attainable goal to come up with a
simple, easy-to-use metadata library, that folks who need higher level
capability (inference, multi-language support, representation/etc.) can
build upon for their own needs. In other words, someone shouldn't have to
rewrite the ability to have met keys, with multiple values associated with
them, with ways to map between the keys, etc., however, it's reasonable that
someone may need to rewrite the ability to represent metadata in RDF (versus
OWL), to rewrite the ability to do language translation (e.g., using XMP
versus Adobe's toolkit), that type of thing.

In any case, I'm happy to participate in any standardization efforts wearing
my Tika hat, with the understanding that whatever gets developed needs to
"fit in" the right place, be architected for extensibility, and have
cognizance of what was done previously, what the gaps are, and why the gaps
should be addressed.



Chris Mattmann, Ph.D.
Cognizant Development Engineer
Early Detection Research Network Project
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

View raw message