axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Jencks (JIRA)" <>
Subject [jira] Created: (AXIS-1971) problem with BOM and character set encoding
Date Mon, 02 May 2005 22:18:05 GMT
problem with BOM and character set encoding

         Key: AXIS-1971
     Project: Axis
        Type: Bug
  Components: Serialization/Deserialization  
    Versions: current (nightly)    
    Reporter: David Jencks
 Attachments: MessageContext.diff

I'm encountering this problem in the geronimo axis integration, so it's possible that it is
not an axis bug, but I don't see how.

I send a UTF-16 character set encoded message to the server, and get back a message that starts
with a byte order mark but claims to be UTF-8.

I've copied the code from AxisServlet that sets the character encoding on the response to
the equivalent place in geronimo code.

After tracing through what is happening, I find that during the return from the invoke call,
leaving the HandlerChainImpl (postInvoke line 206) the entire response is serialized with
the default UTF-8 character set encoding into a ByteArray.

After invoke returns, the code from AxisServlet changes the character set encoding to UTF-16
and writes out the message.  However, since the message was already serialized into a ByteArray
buffer, this apparently has the effect of writing out the byte order mark and then the byte
array that was produced using UTF-8.

This can be fixed by making the message context set the response message character set encoding
when the response message is set on the message context (see attached patch).

I find the logic that determines the response character set encoding byzantine and would prefer
to simplify it to the extent that I can understand how it works... I would need answers to
these questions in order to proceed:

1. Under what circumstances would a response message be in a different character set encoding
than a request?

2. Which user/application code should be able to set the response character set encoding and
how should it do so?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message