tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor
Date Mon, 06 Oct 2014 12:14:34 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160234#comment-14160234
] 

ASF GitHub Bot commented on TIKA-1369:
--------------------------------------

GitHub user vilmospapp opened a pull request:

    https://github.com/apache/tika/pull/17

    TIKA-1369 Avoid ThreadLocal usage from Memory Leak

    Hi @chrismattmann ,
    
    Based on our discussion from https://github.com/apache/tika/pull/15 I've added the ThreadLocal
clean up part, so theoretically it won't suffer from the scenario that @grossws mentioned.
    
    Cheers,
    Vilmos

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vilmospapp/tika TIKA-1369-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17
    
----
commit f95fad94619946ef1d4fe7cf407deab6317ad2fd
Author: Vilmos Papp <papp.gyorgy.vilmos@gmail.com>
Date:   2014-10-06T12:10:14Z

    TIKA-1369 Avoid ThreadLocal usage from Memory Leak

----


> Date parsing and thread safety in ImageMetadataExtractor
> --------------------------------------------------------
>
>                 Key: TIKA-1369
>                 URL: https://issues.apache.org/jira/browse/TIKA-1369
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>         Environment: OS X 10.9.4 Java 7_60
>            Reporter: John Gibson
>            Assignee: Chris A. Mattmann
>            Priority: Critical
>             Fix For: 1.7
>
>
> The {{ImageMetadataExtractor}} uses a static instance of {{SimpleDateFormat}}.  This
is not thread safe.
> {code:title=ImageMetadataExtractor.java}
>     static class ExifHandler implements DirectoryHandler {
>         private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
>         ...
>         public void handleDateTags(Directory directory, Metadata metadata)
>                 throws MetadataException {
>             // Date/Time Original overrides value from ExifDirectory.TAG_DATETIME
>             Date original = null;
>             if (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
>                 original = directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
>                 // Unless we have GPS time we don't know the time zone so date must be
set
>                 // as ISO 8601 datetime without timezone suffix (no Z or +/-)
>                 if (original != null) {
>                     String datetimeNoTimeZone = DATE_UNSPECIFIED_TZ.format(original);
// Same time zone as Metadata Extractor uses
>                     metadata.set(TikaCoreProperties.CREATED, datetimeNoTimeZone);
>                     metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
>                 }
>             }
>        ...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. In the discussion
there the idea of using alternative thread-safe (and faster) formatters from either Joda time
or Commons Lang were dismissed because they would add too many dependencies. Given that Tika
already has a fairly large laundry list of dependencies to parse content, adding one more
JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's formatter or
the call to com.drew.metadata.Directory it can wreak havok during randomized testing. Given
that the timezone is unknown, why not just default it to UTC and let the caller guess the
timezone? As it stands I have to reparse all of the dates into UTC to get stable behavior
across timezones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message