commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [ALL] Character set in dist area? (e.g. RELEASE-NOTES.txt)
Date Mon, 26 Sep 2016 07:15:39 GMT
Hello Stian,

Stian Soiland-Reyes <stain@apache.org> schrieb am Mo., 26. Sep. 2016 um
01:45 Uhr:

> As I mentioned in the BeanUtils vote, its RELEASE-NOTES.txt was in
> character set ISO-8859-1, instead of say UTF-8 (to represent the name
> "Tommy Tynjä").
>
> However the RELEASE-NOTES are special in that they go into git/svn and
> thus the release zip/tar.gz, but also we copy them into the dist
> download area - see for instance
>
>
> http://www.apache.org/dist/commons/collections/RELEASE-NOTES-4.0.txt
>
> which  (if you search for COLLECTIONS-8) should say correctly with
> Norwegian O-slash:
>
> > Thanks to Rune Peter Bjørnstad.
>
> but instead might (as in my Chromium browser) be shown incorrectly in
> "WTF8":
>
> >  Thanks to Rune Peter Bjørnstad.
>
>
> This is because the file (at least from www.apache.org) is served as just:
>
> Content-Type: text/plain
>
> e.g. character set ISO 8859-1 (Latin 1).
>
>
> (Different mirrors might have a different AddDefaultCharset set -
> http://www.apache.org/info/how-to-mirror.html does not mandate any)
>
>
> I think we should correctly cater for any non-latin1-names in our
> release notes - people should be thanked by their real names -- not
> everyone wants to legally change their name to an ASCII-compatible
> version (says formerly "Stian Søiland").
>
>
> So I had a look at the immediate files in dist, and found these
> non-ASCII text files:
>
> stain@biggiebuntu:~/src/95/commons$ find . -type f | grep -v .svn |
> xargs file | grep -v ASCII
>
> ./bcel/RELEASE-NOTES.txt:
> UTF-8 Unicode text
> ./email/RELEASE-NOTES.txt:
> UTF-8 Unicode text
> ./codec/RELEASE-NOTES.txt:
> ISO-8859 text, with CRLF line terminators
> ./logging/RELEASE-NOTES.txt:
> UTF-8 Unicode text
> ./cli/RELEASE-NOTES.txt:
> ISO-8859 text
> ./beanutils/RELEASE-NOTES.txt:
> C++ source, ISO-8859 text
> ./collections/RELEASE-NOTES.txt:
> UTF-8 Unicode text
> ./collections/RELEASE-NOTES-4.0.txt:
> UTF-8 Unicode text
> ./compress/RELEASE-NOTES.txt:
> UTF-8 Unicode text
> ./lang/RELEASE-NOTES.txt:
> ISO-8859 text
>
>
> I propose we add a default commons/.htaccess which sets something like:
>
>     AddCharset UTF-8 .txt .html
>
> ..and convert the ISO-8859 ones to UTF-8; (checking manually they are
> latin 1 and not any of the other latin variants). We should fix both
> in dist and git/svn to avoid regression.
>
>
> As various .htaccess files are already in operation across dist (I
> found at least 20, including under httpd), so I think this should be
> OK.
>
>
> For the BeanUtils 1.9.3 release I thus added such an .htaccess - then
> we can see if that breaks anything on the mirrors. So far so good:
>
> stain@biggiebuntu:~/src/95$ curl -s -I
> http://www.apache.org/dist/commons/beanutils/RELEASE-NOTES.txt | grep
> Content-Type
> Content-Type: text/plain; charset=utf-8
>
>
>
> Views..?
>

Thank you for the thorough analysis. I agree with your proposal to add
.htaccess to get the charset right.

Thank you,
Benedikt


>
> --
> Stian Soiland-Reyes
> http://orcid.org/0000-0001-9842-9718
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message