cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.Pietschmann" <>
Subject Re: Problem with entities in XMLSerializer
Date Thu, 24 Apr 2003 18:51:22 GMT
Anna Afonchenko wrote:
> Sorry if something like this was already asked, but I couldn't find
> anything similar in the archives.
> I have an html file with some entities inside (apostophe, trademark, quotes):
>         <p>This is some text with &#8217; entities.</p>

> The output xml file has entities converted to corresponding characters:
>       <p>This is some text with ' entities.</p> 

There are only a few predefined entities in XML. Most of the entities
predefined in HTML are not available in vanilla XML. Because there is no
XHTML serializing method, the serializer does not know that it is
serializing XHTML, and therefore it wont use (X)HTML entities. Instead it
uses Unicode characters if the encoding permits (that's what you see
because you use the default encoding of UTF-8, which permits unescaped
representation of any Unicode character), otherwise it uses character

> And if I try to see the source code of this output from the browser
 > (e.g. in Notepad), I get strange characters instead of the
> entities. And I can't work out why this is happening.

The serialized file is most likely UTF-8 encoded. Notepad doesn't know
about UTF-8 encoding (except on WinXP, where you can choose it) and
it interprets the bytes from the file as characters from the configured
codepage, which usually results in "strage characters".

> But maybe I am just confusing something?

Take some time and learn about the difference between a Unicode character
and a character encoding. There is lots of material available, for example
start with the XSL FAQ reachable from


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message