tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2722) Don't call Date.toString (Possible issue with JDK 11)
Date Wed, 05 Sep 2018 16:33:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604643#comment-16604643

Uwe Schindler commented on TIKA-2722:

bq.  I reported it to Oracle using their normal channel for reporting bugs. 

Once you get the internal ID, send it to Rory, helps to speedup. Especially as this is shortly
before the relesae. IMHO thats a real bug and should be fixed before release! Not sure about
their priority internals :-)

> Don't call Date.toString (Possible issue with JDK 11)
> -----------------------------------------------------
>                 Key: TIKA-2722
>                 URL: https://issues.apache.org/jira/browse/TIKA-2722
>             Project: Tika
>          Issue Type: Bug
>         Environment: Tika 1.18, JDK 11 with locale set to "ar-EG".  
>            Reporter: David Smiley
>            Priority: Major
> I'm troubleshooting [a test failure in Apache Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] "extracting"
contrib that occurs in JDK 11 with locale "ar-EG".  JDK 8 & 9 passes; I don't know about
JDK 10. It has to do with extracting date metadata from a PDF, particularly the created date
but perhaps others too.
> I stepped through the code into Tika and I think I've found out where the troublesome
code is.  First note PDFParser line 271: {{addMetadata(metadata, "created", info.getCreationDate());}}.
 That addMetadata overload variant will call toString on a Date.  IMO that's asking for trouble
since the output of that is Locale-dependent.  I think that's okay to show to a user but not
for machine-to-machine information exchange.  In the case of the test, it yielded this odd
looking date string:
> Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008
> I pasted that in and it looks consistent with what I see in IntelliJ and in Jenkins logs;
hopefully will post correctly to JIRA.  The odd part is the hour & minutes relative to
GMT.  I won't be certain until after I click "Create".
> Perhaps this problem is also indicative of a JDK 11 bug?  Nevertheless I think Tika should
avoid calling Date.toString().

This message was sent by Atlassian JIRA

View raw message