axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shankar Unni (JIRA)" <>
Subject [jira] Commented: (AXIS-2025) Illegal XML characters in String arguments and return values cause XML exceptions in Axis calls
Date Tue, 28 Jun 2005 16:43:12 GMT
    [ ] 

Shankar Unni commented on AXIS-2025:

> With less jumping up and down and more providing of a patch, I would think that this

> is a valid bug and be inclined to apply the patch.

I apologize. It's been a bit frustrating dealing with this bug.

Yes, Axis is a SOAP processor, and not an RPC mechanism. And Yes, SOAP lays out rules for
what is a valid message. The question is what happens when you try to implement an RPC mechanism
on top of SOAP - the bug is in *that* layer, which the Axis library also supports.

I don't entirely agree that a String is bad simply because it contains an occasional  binary
character. That would be a totally novel definition of String for just about any language.
Every language has rules about what it allows in Strings, and in every case (C, C++, Java,
VB, Pascal), unprintable characters are allowed subject to certain (minimally restrictive)
rules (e.g. usually, no NULs, etc.).

Also, it's not like the entire String is, e.g., the contents of some binary file, or something
like that (which would, of course, be more appropriately handled as an attachment - we do
have certain pieces of data that are truly "binary", and we do encode them as byte[] for proper
handling). Sometimes, it's just a String that has an occasional "unprintable" character in
it (think: "ESC" or "BEL"(^G)).

(In fact, one of the situations where we ran into this was a SOAP interface to a systems monitoring
layer that read the contents of log files and sent back events with the log messages in them.
And these log messages often contain ESC and ^G. It's absurd to make every String in this
API an attachment; and the use of these so-called "unprintable" characters is entirely valid
in the context - they *are* legal things to spit out on screens, and they are legal to put
into strings).

So yes, it's a larger issue - *if* you're building an RPC layer on top of the SOAP infrastructure,
and given the restrictions in the SOAP layer's use of <xsd:string>, how do you safely
transport Strings in general.

This is definitely an interoperability issue. But in the meantime, is there some other type
that can be used to transport such strings? For instance, is it possible to use a custom type
to map Strings into SOAP (i.e. avoid <xsd:string>)? Would such a thing be portable?
I could see a custom mapping to some base64-type representation for the string body, but both
sides need to agree that it's a String, and have it be mapped back to a String upon decoding.

Even if ready-made solutions are not available, I would greatly appreciate some hints or suggestions
on how this can be *reasonably* handled - something that doesn't involve trying to hunt down
every string in every interface and possibly convert them to attachments, or hand-encoding
and decoding every string everywhere..  (I.e. reasonable workarounds / suggestions would be
a great help!)

> Illegal XML characters in String arguments and return values cause XML exceptions in
Axis calls
> -----------------------------------------------------------------------------------------------
>          Key: AXIS-2025
>          URL:
>      Project: Apache Axis
>         Type: Bug
>   Components: Serialization/Deserialization
>     Versions: 1.2
>  Environment: All (but reproduced on WinXP).
> Axis 1.1 and 1.2
>     Reporter: Shankar Unni
>     Assignee: Venkat Reddy
>  Attachments: Axis1.1badmsgAPI.log, Axis1.1echoAPI.log, Axis1.2badmsgAPI.log, Axis1.2echoAPI.log
> Arguments and return values of Java type String are incorrectly handled if they contain
non-printing illegal ASCII characters.
> Example 1: bad return values:
> - - - - - - - - - - - - - - -
> E.g. the string 
>   "bad char: " + (char)3 + "."
> Trivial example:
> foo.jws:
>   public class foo {
>     public String badmsg()
>     {
>       return "bad: " + (char)3 + ".";
>     }
>   }
> When calling this method and the server is running on Axis 1.1, it returns XML with the
illegal character ASCII "3" in the text:
>    <badmsgReturn xsi:type="xsd:string">bad: ?.</badmsgReturn>  
> This causes an XML parse exception on the client side ("org.xml.sax.SAXParseException:
An invalid XML character (Unicode: 0x3) was found in the element content of the document.")
> With Axis 1.2, the server doesn't even return a valid response: I get an HTTP 200 OK
with an empty content, causing a different XML parse error.
> Example 2: bad parameter values:
> - - - - - - - - - - - - - - - -
> A similar problem exists when passing such a string from the the client side.
> If I have a method in foo.jws:
>   public class foo {
>     public String echo(String s)
>     {
>       return s;
>     }
>   }
> Then if I write an ordinary Java client to call this, and pass it a bad string as in
the beginning of this post, I get an exception thrown while the call is being composed:
> java.lang.IllegalArgumentException: The char '0x3' in 'bad char: ?.' is not a valid XML
> This is somewhat absurd: shouldn't the serialization layer be encoding these illegal
XML characters as entity escapes? They're entirely legal in the current locale (US), and normal
Java code handles this character quite normally.  Why should it croak when passed by XML/RPC?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message