tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2722) Don't call Date.toString (Possible issue with JDK 11)
Date Wed, 05 Sep 2018 15:38:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604578#comment-16604578

Hudson commented on TIKA-2722:

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #307 (See [https://builds.apache.org/job/tika-2.x-windows/307/])
TIKA-2722 -- cleanup Calendar metadata in PDFParser (tallison: rev 5c7dafada7caee90635283d16b04d403eef7e047)
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java

> Don't call Date.toString (Possible issue with JDK 11)
> -----------------------------------------------------
>                 Key: TIKA-2722
>                 URL: https://issues.apache.org/jira/browse/TIKA-2722
>             Project: Tika
>          Issue Type: Bug
>         Environment: Tika 1.18, JDK 11 with locale set to "ar-EG".  
>            Reporter: David Smiley
>            Priority: Major
> I'm troubleshooting [a test failure in Apache Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] "extracting"
contrib that occurs in JDK 11 with locale "ar-EG".  JDK 8 & 9 passes; I don't know about
JDK 10. It has to do with extracting date metadata from a PDF, particularly the created date
but perhaps others too.
> I stepped through the code into Tika and I think I've found out where the troublesome
code is.  First note PDFParser line 271: {{addMetadata(metadata, "created", info.getCreationDate());}}.
 That addMetadata overload variant will call toString on a Date.  IMO that's asking for trouble
since the output of that is Locale-dependent.  I think that's okay to show to a user but not
for machine-to-machine information exchange.  In the case of the test, it yielded this odd
looking date string:
> Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008
> I pasted that in and it looks consistent with what I see in IntelliJ and in Jenkins logs;
hopefully will post correctly to JIRA.  The odd part is the hour & minutes relative to
GMT.  I won't be certain until after I click "Create".
> Perhaps this problem is also indicative of a JDK 11 bug?  Nevertheless I think Tika should
avoid calling Date.toString().

This message was sent by Atlassian JIRA

View raw message