tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-482) Refactor image and jpeg parsers for access to MetadataExtractor API
Date Fri, 03 Sep 2010 15:04:32 GMT

    [ https://issues.apache.org/jira/browse/TIKA-482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905930#action_12905930

Nick Burch commented on TIKA-482:

Thanks for this patch

I've applied it with a few tweaks in r992319.

The two main changes were:
* Different name for the Exif parser class - ImageMetadataExtractor seemed a better name than
* Original and default dates done slightly differently. This was with TIKA-504 in mind, but
we should maybe think about which is the right set of date related properties to map onto

I'll keep this issue open for now, as it looks from your Git repo that you've some more cool
new refactorings to come along shortly!

> Refactor image and jpeg parsers for access to MetadataExtractor API
> -------------------------------------------------------------------
>                 Key: TIKA-482
>                 URL: https://issues.apache.org/jira/browse/TIKA-482
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Staffan Olsson
>         Attachments: TIKA-451-DublinCore_and_TIKA-482.patch
> When I added support for more image metadata in TIKA-472, i realized
> the current design had some restrictions:
>  * I could not access the typed getters from Metadata Extractor, such
> as getDate (to format iso date) and getStringArray (for keywords).
>  * The handler function was called one field at a time which prevents
> logic where one field depends on the value of another (there is for
> example record versions and fields that specify encoding)
> See attached patch. It refactors TiffExtractor to MetadataExtractorExtractor.
> The patch also includes the date fix, see https://issues.apache.org/jira/browse/TIKA-451#action_12898794
> We can later add more Extractors using other libraries, and map to parsers based on format.
For example we already use ImageIO in ImageParser so maybe there should be an ImageIOExtractor.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message