tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Ehrlich <jehrl...@adobe.com>
Subject XMP conversion module for Tika
Date Thu, 28 Jun 2012 10:14:57 GMT

As discussed in earlier threads, I have created a new Tika module ("tika-xmp") which offers
conversion of Tika Metadata to the XMP data model and I have added the patch to TIKA-756.
The patch also contains integration with Tika-app, hooking the converter up with the "-y"
output option and therefor providing XMP output for Tika CLI.
Any client interested in using XMP as metadata container can use the new module, other clients
are not affected.

The module's API extends the tika-core Metadata class but also offers the possibility to directly
work with the XMP data model. This approach has been chosen to ease the usage for existing
Tika clients but also providing the ability to work with the XMP data model if required.
The Metadata information from Tika can either be converted by mimetype-specific converters
which convert everything for their respective file formats or by a generic converter, which
will convert all full qualified properties which use prefixes from registered namespaces,
i.e. all standard namespaces like DC, IPTC, EXIF, etc. are already supported but clients can
also register custom ones. For that purpose the API also offers the Namespace registry functionality
from the XMPCore component.
I have also provided converters for the office file formats in the patch, allowing conversion
of all properties for those file formats.

I have provided two patches to TIKA-756.
The "tika-xmp.patch" provides the extra Tika module based on the current Tika source.
The "tika-xmp_dependsOn_TIKA929changes.patch" contains the same tika-xmp module as offered
by the other patch, but depends on the patch from TIKA-929 being applied first.
The recommendation is to resolve TIKA-929 and then use the latter patch.

Please note: The tika-xmp module provided by the patches use the XMPCore library available
in the Maven Central repository.
Unfortunately the current version 5.1.0 has accidentally been compiled for JDK 1.7 which is
not compatible with Tika. We are in the process of uploading an update of the library with
the version 5.1.1 which will solve that problem. The Patch in TIKA-756 can only be applied
when the new XMPCore version 5.1.1 is available in Maven Central.
I will update the issues once that has happened.


Jörg Ehrlich | Computer Scientist | XMP Technology | Adobe Systems | joerg.ehrlich@adobe.com<mailto:joerg.ehrlich@adobe.com>
| work: +49(40)306360

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message