tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-451) Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
Date Wed, 07 Jul 2010 22:25:51 GMT

    [ https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886119#action_12886119

Jukka Zitting commented on TIKA-451:

I would only do property type checks in type-specific setters like setDate() or setInteger().
I'd allow the generic set() method with a string argument to always succeed. This avoids breaking
the parsing of a document even if some of its metadata fields are malformed against our expectations.

Similarly I'd avoid throwing any exceptions from metadata getters. A malformed metadata value
should probably be handled as if it was missing by the type-specific getters, and returned
as-is by the generic get() method.

> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>                 Key: TIKA-451
>                 URL: https://issues.apache.org/jira/browse/TIKA-451
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>            Priority: Minor
> Currently, the PDF Parser does   calendar.getTime().toString()   which means dates end
up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two problems
> The poi ole2 based parsers also output in date.toString() format, with the same timezone/parsing
> We should probably select one format, and update the parsers to all output in it

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message