www-apache-bugdb mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk-Willem van Gulik <di...@webweaving.org>
Subject Re: general/4492: UTF-8 encoding URL's at IE5 won't work with special directory-names and apache
Date Sun, 30 May 1999 12:30:00 GMT
The following reply was made to PR general/4492; it has been noted by GNATS.

From: Dirk-Willem van Gulik <dirkx@webweaving.org>
To: Ralf Weinand <postmaster@zebra.inka.de>
Cc: apbugs@hyperreal.org
Subject: Re: general/4492: UTF-8 encoding URL's at IE5 won't work with special
 directory-names and apache
Date: Sun, 30 May 1999 14:00:27 +0200 (CEST)

 On 29 May 1999, Ralf Weinand wrote:
 
 > After the Installation of IE5 (german version) i got Problems with my websites
 > Some links, that i created via javascript won't work.
 > But javascript isn't the problem.
  
 > non english URL's are standardly encoded in the UTF-8 mode, so i can't
 > reach the sites with special words. when i disable the utf-8 in the
 > IE5-properties (deep inside), all will work.
  
 > i searched a while about the UTF-8 Meaning, but i do nor Know, whether
 > UTF-8 is a standard real planned for the Internet.
 > this .txt file will not be reached with IE5 and the standard-installation
 
 Although perhaps too technical; this is not really a server problem; but
 one having to do with the way IE5 implements some of their
 internationalization and localization. And some of that is plain wrong,
 wrong and wrong. Sorry. But there is a way round it; see the end of this
 longish msg.
 
 As for apache; apache can deal with UTF8 files just fine; they are send
 out exactly as they are; but you should make sure that the Charset is
 set right of course. See www.w3.org/International for more information.
 
 As for UTF8 inside a URI; there are some rules all URI's are to adhere to,
 and what characters they may contain. Unfortunately your ringel-ss or sz
 is not one of them, nor are say chinese characters. This page explains
 it in detail:
 
 	http://www.w3.org/International/O-URL-and-ident.html
 
 In short the rules are
 
 	0.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 	1.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 	2.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 	3.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 	4.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 	5.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 	6. 	any special character (i.e. not a-z, 0-9 and a few
 		more) is to be encoded as a '%xy' where x and y are 
 		hex numbers 0..9a..f.
 	7.	the URI is an octed stream with no real meaning,
 		i.e. just a sequence of numbers.
 
 To confuse matters; that sequence of numbers just _HAPPENS_ (but this
 is entirly coincedental and of no substance) to look like a human
 readable string when you look up the numbers in an ASCII table. But
 you should completely forget this :-)
 
 What now follows is an incredible simplification of the real story. But
 it might help. The 'solution' for your problem is at the end. I hope.
 
 What generally happens is that a user enters a URL in the bar of the
 browser. The browser, together with the OS then translates this into
 a valid octed-string, as per RFC2396 according to localization rules.
 I.e. the user can actually type in strange char's, such as the sz,
 the ae, ij and many others needed in dutch, danish, chinese, german
 and so on... but the browser; helped by the OS (which has details on
 what the user meant when it typed in the string) is to translate those
 to a simple octed string.
 
 This string then goes to the server. The apache server decodes part
 of this string; but basically passes it on the the OS which then tries
 to work out what file you have. If the OS understands UTF8 coded file
 names you are usually all right. But obviously there is a big i18n
 problem here.
 
 But... in an HTML, regardless of the charset it is written in, wether
 it is in chinese, german or greek; the URI's, i.e. the bits between
 the href="...." quotes are _NOT_ in the charset of that page; but
 are to be treated as an octed stream; and send on the wire exactly
 like that. So even though one would type in the browser window's
 location bar
 
 	http://www.teddy-online.de/Teddys/Gro_/Teddy-schwarz.txt
 
 (where the '_' is the Beta shaped german 'sz' char), you would code it in
 the HTML as
 
 	<a href="/images/Teddys/Gro%df/Teddy-schwarz.txt">
 
 i.e. use a 'hex' escape instead of the ringel-ess/sz. The same applies
 for javascript _AND_ for java; despite the fact that all code, comments
 and displayable strings in java are in UTF8, you are to threat the URIs
 strictly as octed strings if you encode them directly.
 
 Hope this helps,
 
 Dw.
 
 
 
 

Mime
View raw message