ws-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wilson <...@wilson.co.uk>
Subject Re: accented characters in XMLRPC arguments
Date Wed, 10 Dec 2003 20:00:33 GMT

On 10 Dec 2003, at 18:58, Tom Bradford wrote:

> Sorry, but this is untrue.  The XML document prolog accepts an 
> 'encoding' directive for a reason, and that is so that it can properly 
> parse a document that uses a specific character set.  Characters like 
> extended latin, katakana, cyrillic, and such can all be represented by 
> UTF-8 encoding without expressing them as entities.
>
> The problem is that the Apache XML-RPC library, even though it 
> supports the ability to force the XML document prolog's encoding, has 
> a bug in the XMLWriter class when it comes to characters above 0xFF, 
> so anything other than the basic latin set will throw your error, even 
> though according to the XML spec, those characters are perfectly legal 
> for a document.


Nice to see you on the list, Tom.

What I would propose is that the default encoding remain as ISO 8859/1 
(so we don't break the non UTF-x aware implementations which exist 
today) and to allow *only* UTF-8 and UTF-16 to be specified as 
alternate encodings. You can't support arbitrary encodings unless you 
know the mappings of Unicode code points onto the encoding character 
set (i.e. you have to know which characters to escape).

We also fix the XMLWriter to do the proper escaping when using ISO 
8859/1 encoding and to do no escaping otherwise.

Comments?



John Wilson
The Wilson Partnership
http://www.wilson.co.uk


Mime
View raw message