tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: svn commit: r1616295 [1/2] - in /tika/trunk: ./ tika-app/src/main/java/org/apache/tika/cli/ tika-app/src/test/java/org/apache/tika/cli/ tika-core/src/main/java/org/apache/tika/detect/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java
Date Thu, 07 Aug 2014 12:39:52 GMT
>       static class ExifHandler implements DirectoryHandler {
> -        private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new
> SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
> +        private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new
> SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss", Locale.getDefault());
> That looks to be formatting to ISO-8859-1 format, so should probably be
> using a standard locale not the system default - ISO-8859-1 is the same
> everywhere!

This is exactly what I meant on the review board and in the issue: This should be Locale.ROOT,
which means formatted language independent. If your computer in Thailand, then you have a
big problem with the default locale - it does not even use ASCII digits anymore!

In general, the locales should be always defined:
- (Error-)Messages that are written in English, should be formatted with String.format(Locale.ENGLISH).
- Upper/Lowercasing for stuff like comparison or lookup in hashmaps should almost always be
done with Locale.ROOT
- Charsets should always be given explicit, especially if we read resources from our own JAR
file: Here we should prefer UTF-8
- If we read/write to console, this is the only place where you should use Charset.getDefault()

Unrelated, just also important: The SimpleDateFormat above should definitely not used by multiple
threads, SimpleDateFormat is not threadsafe!!!


View raw message