axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rodrigo Ruiz (JIRA)" <axis-...@ws.apache.org>
Subject [jira] Commented: (AXIS-2342) Reopen issue: Character entities are escaped too aggressively
Date Mon, 23 Apr 2007 10:21:15 GMT

    [ https://issues.apache.org/jira/browse/AXIS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490888
] 

Rodrigo Ruiz commented on AXIS-2342:
------------------------------------

I am a bit puzzled with this bug.

In principle, I agree with Thiago. If the output writer is created with the correct encoding
(and it seems it is), there should be no need to "re-encode" characters above 0x7F in UTF-8,
or above 0xFFFF in UTF-16.

It seems the class org.apache.axis.components.encoding.AbstractXmlEncoder fixes this issue
in its "encode" method. The problem is that none of its subclasses uses the same strategy
for their writeEncoded() methods. Why is it so?

In fact, looking at the code, once the "entities replacement" code is removed from the subclasses,
they are all the same! It seems we could live with only a single XMLEncoder implementation
for all encodings! Please, can anybody confirm or correct this?

> Reopen issue: Character entities are escaped too aggressively
> -------------------------------------------------------------
>
>                 Key: AXIS-2342
>                 URL: https://issues.apache.org/jira/browse/AXIS-2342
>             Project: Axis
>          Issue Type: Bug
>          Components: Serialization/Deserialization
>    Affects Versions: 1.0
>         Environment: Operating System: All
> Platform: All
>            Reporter: Thiago Jung Bauermann
>         Assigned To: Axis Developers Mailing List
>         Attachments: PATCH_2342.txt, TESTCASE_2342.txt
>
>
> We are using SOAP to send XML documents from client to server and back. The 
> documents contain a lot of non-ASCII data. This is encoded as UTF-8 by us. 
> However, when retrieved from an Axis server, Axis will escape almost all of our 
> characters into character entities (so &#...;) This means messages become about 
> three times as big as they have to for 'international' documents, which for us 
> is a large performance problem. I narrowed down the problem to
>   XMLUtils::xmlEncodeString
> that has the code:
>                 if (((int)chars[i]) > 127) {
>                         strBuf.append("&#");
>                         strBuf.append((int)chars[i]);
>                         strBuf.append(";");
> This seems unnecessary to me, as Axis will send all messages in UTF-8 anyway, 
> for which no encoding is necessary (and should encoding be configurable, I feel 
> this should be escaped elsewhere).
> Is there any reason for this code, I commented it out and it seemed to have no 
> adverse effect on our application (apart from reduced network traffic)?
> Tested with 1.0, also looked up in the sources of 1.1-rc2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org


Mime
View raw message