tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-451) Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
Date Tue, 06 Jul 2010 17:07:50 GMT

    [ https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885604#action_12885604
] 

Jukka Zitting commented on TIKA-451:
------------------------------------

See page 11 of http://www.adobe.com/devnet/xmp/pdfs/XMPSpecificationPart2.pdf for the ISO
8601 subset used by XMP. I think that matches our needs pretty well.

One of my forward-looking ideas behind introducing the Property class was to use it for these
kinds of type-safe value conversions. We could add Property.setDate(Metadata, Date) and Property.getDate(Metadata)
methods that could also take advantage of the static value type information included in the
Property constants. For example an integer property constant could throw an exception (or
use some predefined conversion rule) when you attempt to get its value as a date. For added
compile-time type-safety we could even add explicit DateProperty, IntegerProperty, etc. subclasses
for specific kinds of metadata properties.

> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-451
>                 URL: https://issues.apache.org/jira/browse/TIKA-451
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Priority: Minor
>
> Currently, the PDF Parser does   calendar.getTime().toString()   which means dates end
up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two problems
> The poi ole2 based parsers also output in date.toString() format, with the same timezone/parsing
problems
> We should probably select one format, and update the parsers to all output in it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message