axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arseny S (JIRA)" <>
Subject [jira] Commented: (AXIS-2342) Reopen issue: Character entities are escaped too aggressively
Date Fri, 22 Feb 2008 09:24:19 GMT


Arseny S commented on AXIS-2342:

This is a major problem for our project: we need to send Russian text through Axis to Axis/C
and .NET.
How can this encoding be called UTF8 if it encodes all symbols after 0x7f in a special way??
It should be called HTML encoding then.
The right way of encoding is shown in the previous patch.

For now we are searching for some kind of workaround. May be using some generic type with
our Serializer/Desrializer classes.

> Reopen issue: Character entities are escaped too aggressively
> -------------------------------------------------------------
>                 Key: AXIS-2342
>                 URL:
>             Project: Axis
>          Issue Type: Bug
>          Components: Serialization/Deserialization
>    Affects Versions: 1.0
>         Environment: Operating System: All
> Platform: All
>            Reporter: Thiago Jung Bauermann
>            Assignee: Axis Developers Mailing List
>         Attachments: AXIS_2342.diff, PATCH_2342.txt, TEST_2342.diff, TESTCASE_2342.txt
> We are using SOAP to send XML documents from client to server and back. The 
> documents contain a lot of non-ASCII data. This is encoded as UTF-8 by us. 
> However, when retrieved from an Axis server, Axis will escape almost all of our 
> characters into character entities (so &#...;) This means messages become about 
> three times as big as they have to for 'international' documents, which for us 
> is a large performance problem. I narrowed down the problem to
>   XMLUtils::xmlEncodeString
> that has the code:
>                 if (((int)chars[i]) > 127) {
>                         strBuf.append("&#");
>                         strBuf.append((int)chars[i]);
>                         strBuf.append(";");
> This seems unnecessary to me, as Axis will send all messages in UTF-8 anyway, 
> for which no encoding is necessary (and should encoding be configurable, I feel 
> this should be escaped elsewhere).
> Is there any reason for this code, I commented it out and it seemed to have no 
> adverse effect on our application (apart from reduced network traffic)?
> Tested with 1.0, also looked up in the sources of 1.1-rc2.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message