abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Moseley" <...@osafoundation.org>
Subject more fun with character encodings
Date Sat, 08 Sep 2007 01:31:53 GMT
i'm running into a similar issue as was discussed earlier this week
with regard to problem data.

as was mentioned earlier, it turns out that the os x native character
encoding is MacRoman. well, it appears that even though both my mysql
database and my jdbc connection are configured to use utf8, at some
point the data taken from the db and inserted into an atom feed is
turning up in MacRoman, even though the ResponseContext's content type
is set to "application/atom+xml; charset=UTF-8".

from my re-reading of the various recent threads and my examining of
the code in the 0.3.0 branch, it seems like the value i set for an
entry's title (for instance) should be converted into utf8 while the
entry is being serialized. but it's clearly not. when i look at the
feed as it's fetched from my server by curl, in Terminal.app, the
non-ascii character in the entry title is rendered using what i like
to call the "wtf" glyph rather than the one that represents the actual
character in question. and when i run the feed through the
validome.org validator, it complains about this character being an
invalid utf8 character.

when i run the server and database on linux and get a non-ascii
character into the database,viewing the corresponding entry document
in Terminal.app shows me the expected character, not the wtf one.

i've run through all of my code looking for places where we might be
instantiating a Reader without specifying an encoding, but i can't
find any. i'm using the 0.3.0-incubating jars that i deployed earlier
today into the people.apache.org/m2-incubating-repository which
contain the recent default encoding fixes. so i'm at a loss as to what
could be going on. i feel like i'm missing something basic with regard
to character encodings. any pointers?

for reference, here's a url for the entry document as served by os x.
notice the final character of the title and summary are both the wtf


and here is what happens when i plug that url into validome's atom validator:



View raw message