axis-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Mitchell (JIRA)" <>
Subject [jira] Commented: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than
Date Wed, 30 Jan 2008 22:39:34 GMT


Bill Mitchell commented on AXIS2C-859:

Lahira, after yesterday I researched again the XML spec and I find that it says that replacement
of XML characters and entity references happens on the URI to generate the normalized value.
 So it seems we have to do this character replacement logic on the attribute value string
before we process it as a possible namespace declaration.  Just another extra wrinkle.  

My "second" item above alluded to a different solution, built into guththila_next() instead
of guththila_token_close().  One could imagine, in the "right" loops in guththila_next where
we are looking at the characters one at a time anyway, we could detect the leading ampersand,
check the next 4 or 5 characters against the XML character reference values, and replace the
character there, again as above sliding the leading part of the token to abut the smaller
single character.  This would avoid a second pass over the token characters looking for the
ampersands, but I suspect it would make guththila_next() much harder to understand than it
already is.  So my second point above was just to say that I think you have chosen the better
approach, to handle this issue of XML character entities in guthtila_token_close() well separate
from the token parsing in guththila_next().

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>                 Key: AXIS2C-859
>                 URL:
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt
> When an incoming message contains within text the escaped ampersand sequence, "&amp;",
this sequence is being passed to the client as raw text without being converted to the single
ampersand character.  Clearly, this action must take place at the level of the parser, as
only the parser knows whether it is seeing simple text, and conversion is required, or text
embedded in a CDATA section, where conversion is not allowed.  I have tested the build with
the libxml parser, and of course the libxml parser behaves correctly: the text passed to the
client contains only the single ampersand character, not the escaped sequence.  (See section
2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less
than and greater than as well as ampersand, on both input and output.  I also don't see where
CDATA sections are handled, but as I am not seeing CDATA in the messages from the service
I am hitting, I have not tested this case.  

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message