ws-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Bradford <bradf...@dbxmlgroup.com>
Subject Re: accented characters in XMLRPC arguments
Date Wed, 10 Dec 2003 18:58:41 GMT
Sorry, but this is untrue.  The XML document prolog accepts an 
'encoding' directive for a reason, and that is so that it can properly 
parse a document that uses a specific character set.  Characters like 
extended latin, katakana, cyrillic, and such can all be represented by 
UTF-8 encoding without expressing them as entities.

The problem is that the Apache XML-RPC library, even though it supports 
the ability to force the XML document prolog's encoding, has a bug in 
the XMLWriter class when it comes to characters above 0xFF, so anything 
other than the basic latin set will throw your error, even though 
according to the XML spec, those characters are perfectly legal for a 
document.

They will argue that XML-RPC only accepts US-ASCII encoding, but as of 
the most recent version of the specification, Dave Winer completely 
removed that requirement and now refers to all character strings simply 
as 'strings'.

--
Tom Bradford - http://www.tbradford.org/
CTO - The dbXML Group - http://www.dbxml.com/
Project Labrador - http://www.dbxml.com/labrador/


Cristiano Fugazza wrote:
> Hi, i'm just a newbie with xml-rpc but yours is a common problem with 
> xml processing: no special characters, such as accented ones, can be 
> handled with xml without expressing them as entities.
>     I mean that the accented "a" should become "&agrave;" or "&aacute;" 
> depending on the kind of accent. Best is if you express them directly in 
> unicode (with a sequence "&#...;").
>     To do so simply pass your data throught JTidy (sarch google, Tidy is 
> a must for xml processing), configuring it to produce numerical entities.
> 
> Cheers,
> 
> Cristiano
> 
> 
> 

Mime
View raw message