axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddhesh Sundar Toraskar (JIRA)" <axis-...@ws.apache.org>
Subject [jira] [Created] (AXIS-2908) Apache Axis fails to handle non Basic Multilingual Plane characters(U+10000 and above) while creating SOAP request
Date Tue, 10 Feb 2015 08:56:34 GMT
Siddhesh Sundar Toraskar created AXIS-2908:
----------------------------------------------

             Summary: Apache Axis fails to handle non Basic Multilingual Plane characters(U+10000
and above) while creating SOAP request
                 Key: AXIS-2908
                 URL: https://issues.apache.org/jira/browse/AXIS-2908
             Project: Axis
          Issue Type: Bug
          Components: Serialization/Deserialization
    Affects Versions: 1.4
         Environment: OS - CentOS
Software Platform - JDK 7
            Reporter: Siddhesh Sundar Toraskar


While creating SOAP request, if we have nonBMP characters(e.g. EMOJIs), they(EMOJIs) are not
properly inserted in XML.

It seems that my content which is UTF-8 will be encoded in UTF-16 Java String (default) once
program receives it.

Apache Axis library that we are using then take those UTF-16 Java Strings and try to convert
back into UTF-8 to create a XML before sending. It fails whenever I send a 4-byte Emoji (:grin:)
UTF-8 character. I found that any UTF-8 4-byte character will be represented as surrogate
pair in UTF-16. I suspect in that case Axis parser not able to understand surrogate pair and
not able to convert into valid UTF-8 encoding.

As result, while UTF-8 is specified, these EMOJIs appear in UTF-16 form which actually corrupts
them because they are then incorrectly processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@axis.apache.org
For additional commands, e-mail: java-dev-help@axis.apache.org


Mime
View raw message