It depends on what type of Serializer you use and what kind of Serlializer config you put into your sitemap?

By default XMLSerializer/HTMLSerializer uses UTF-8 encoding. So instead of 1 UTF-16 char you got 2 chars UTF-8 encoded.
Of cource there might be also issue with emoji charset, but I would first try to change encoding in Serliazer config (to be UTF-16).

Greetings,
-Greg

2017-06-07 10:43 GMT+02:00 Flynn, Peter <pflynn@ucc.ie>:
I had a related problem with 3–4 CJK characters being converted to their &#hex; format. Very weird, but it turned out to be the old and buggy copy of jtidy, and I can't figure out how to replace it. 

I haven't had the problem you describe, though, and I have a user who has implemented emoji in Cocoon, see http://research.ucc.ie/emojis/

P
 
--
Peter Flynn | Academic and Collaborative Technologies | IT Services | University College Cork | Ireland | pflynn@ucc.ie | http://research.ucc.ie/profiles/H505/pflynn | Sent from Hiri
 

On 2017-06-06 17:08:51+01:00 Christopher Schultz wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

I've been testing my application for use with high Unicode code points
such as emoji like 😍 which is this one:
http://www.fileformat.info/info/unicode/char/1F60D/index.htm

My application and database can handle this code point, but Cocoon
butchers it in a way that I have seen before -- the way that
commons-lang's StringEscapeUtils.escapeXml/escapeHtml seems to do.

Instead of letting the character through as-is, it tries to convert it
into these two numbered entities:

��

Oddly enough, those are the two double-byte UTF-16 characters you'd
get, but they shouldn't be split-up like that, I don't think.

I haven't found a version of commons-lang 2.x that doesn't break these
kinds of characters. commons-lang3 does the right thing, but they are
incompatible libraries.

Does anyone know the code well enough to know how difficult it would
be to change the way Cocoon 2.1 escapes its output? For example, by
using commons-lang3?

I haven't tried Cocoon 2.2, yet, and I can't tell what dependencies it
has. I also can't exactly tell what to do now that I've downloaded the
binary package. Can this just be used as a drop-in replacement for
Cocoon 2.1.x? Cocoon 2.1.x could build a WAR file that I then
customized for my own application, adding various libraries and
configuration files to it. I think I'll follow-up with a separate post
about this.

- -chris

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJZNtOBAAoJEBzwKT+lPKRYEuIP/3gSJZDNEbzsHkI5zYjMZbFf vKvRRnBSl+6IdrcUasftf+AkXIIYwj6xnUQ7winsLW/n8TdDG6jPqsg4Khsozc6z aa23qDly62gmCsqpLohXxt/ZNKdPY4sOTghaaEUFTtTgpeD3M/INF90myT8SwO4K WUtqVparSqp/Zf9JMm3OCIguMKbsRNYWVIQuiJxDQJkWYwrw0iVk2v8mc6iz/mDF w6np4EvFr9fqdDufKpPw8anEkrp5JEuTx47vMOtz4sixVr2C6ehgP4zs3kVzdVid QPeUsrosV1tsRC9bMVLGmjo7UhNseeXCp/AceIT6AQE8Q1clgy9GcoNMf60dgGku et0xoGptYgbCfmJL+PuA9y7fJYjgTTQheqzuC721n2/sx+kyBSBWSMIhqia2sd4y spcT4kw+uChsWjwoeGOHOm4IimrVgXkfJeHVSXV4m66sHS9t+bDiiErwS1SikvSV qF64/L0u8hYFLD1ehURoHBi4foE1Td3eRGOGHgodcYL9C8U+Yv+fWaiYQ5O4CCnW pToFvVoQOdZY+VVC8hz1ggbRMSxjT2GQLLJ2mjbGzGUJjlwyQaoZnADSSu0efj88 O2AlWB2Bf/Ag6E4C9jEjj+cauBfR+1NIK7F1Jo6C02yY1SUOSoOAFDZ7EkO4qYAO YhvgSQXNmKps6rusNjNZ =q8Eh -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org For additional commands, e-mail: users-help@cocoon.apache.org